Unlock the Power of Text-to-Speech with Index-TTS Workflow
1. Workflow Overview

This workflow converts text to natural speech using Index-TTS, supporting voice cloning and audio enhancement. Key features:
Text-to-Speech: Processes long texts (e.g., novels) into fluent speech.
Voice Cloning: Mimics speaker timbre from reference audio (e.g.,
蔡徐坤.wav
).Noise Reduction: Cleans background noise for professional output.
Core Models:
Index-TTS: Main model for speech synthesis (requires plugin
ComfyUI-Index-TTS
).Audio Tools: Noise removal (
AudioCleanupNode
), timbre loading (TimbreAudioLoader
).
2. Key Nodes & Installation
Node | Function | Installation |
---|---|---|
IndexTTSNode | Converts text to speech with voice cloning. | Install plugin |
TimbreAudioLoader | Loads timbre templates (e.g., | Place audio files in |
AudioCleanupNode | Reduces noise (strength 0.7) and enhances audio. | Included in plugin. |
LoadAudio | Loads reference audio (e.g., | Built-in node. |
Dependencies:
Index-TTS models (~2-3GB) auto-download on first use.
3. Workflow Structure
Group 1: Input & Voice Cloning
Inputs:
Reference audio via
LoadAudio
.Timbre template via
TimbreAudioLoader
.
Steps:
IndexTTSNode
generates speech from text (e.g., novel chapters).Parameters: Speed (1.0), emotion (0.8), seed (1155511506).
Group 2: Post-Processing
Input: Raw generated audio.
Steps:
AudioCleanupNode
applies noise reduction (100-8000Hz range).SaveAudio
exports WAV toaudio/ComfyUI
.
Group 3: Preview & Output
Preview: Listen via
PreviewAudio
.Output: WAV file (e.g.,
ComfyUI_20240513_142301.wav
).
4. Inputs & Outputs
Inputs:
Text: Supports long texts (example: 4-chapter novel).
Reference Audio: Clear voice sample (≥10 sec recommended).
Timbre Template (optional): Style template (e.g.,
抖音-读小说.wav
).
Output:
WAV file saved in
ComfyUI/audio
.
5. Notes
VRAM:
Index-TTS requires ~4GB VRAM; split long texts if needed.
Quality Tips:
Adjust
frequency_range
inAudioCleanupNode
to preserve voice clarity.
Voice Control:
Change
seed
inIndexTTSNode
for different voice variations.
Debugging:
Avoid special characters in Chinese text to prevent garbled speech.
Pre-clean noisy reference audio for better cloning.