workflow

Text-to-Speech: Processes long texts (e.g., novels) into fluent speech.
Voice Cloning: Mimics speaker timbre from reference audio (e.g., 蔡徐坤.wav).
Noise Reduction: Cleans background noise for professional output.

Core Models:

Index-TTS: Main model for speech synthesis (requires plugin ComfyUI-Index-TTS).
Audio Tools: Noise removal (AudioCleanupNode), timbre loading (TimbreAudioLoader).

2. Key Nodes & Installation

Node	Function	Installation
IndexTTSNode	Converts text to speech with voice cloning.	Install plugin `ComfyUI-Index-TTS` (GitHub: `chenpipi0807/ComfyUI-Index-TTS`).
TimbreAudioLoader	Loads timbre templates (e.g., `抖音-读小说.wav`).	Place audio files in `ComfyUI/input`.
AudioCleanupNode	Reduces noise (strength 0.7) and enhances audio.	Included in plugin.
LoadAudio	Loads reference audio (e.g., `蔡徐坤.wav`).	Built-in node.

Dependencies:

Index-TTS models (~2-3GB) auto-download on first use.

3. Workflow Structure

Group 1: Input & Voice Cloning

Inputs:
- Reference audio via LoadAudio.
- Timbre template via TimbreAudioLoader.
Steps:
1. IndexTTSNode generates speech from text (e.g., novel chapters).
2. Parameters: Speed (1.0), emotion (0.8), seed (1155511506).

Group 2: Post-Processing

Input: Raw generated audio.
Steps:
1. AudioCleanupNode applies noise reduction (100-8000Hz range).
2. SaveAudio exports WAV to audio/ComfyUI.

Group 3: Preview & Output

Preview: Listen via PreviewAudio.
Output: WAV file (e.g., ComfyUI_20240513_142301.wav).

4. Inputs & Outputs

Inputs:

Text: Supports long texts (example: 4-chapter novel).
Reference Audio: Clear voice sample (≥10 sec recommended).
Timbre Template (optional): Style template (e.g., 抖音-读小说.wav).

Output:

WAV file saved in ComfyUI/audio.

5. Notes

VRAM:
- Index-TTS requires ~4GB VRAM; split long texts if needed.
Quality Tips:
- Adjust frequency_range in AudioCleanupNode to preserve voice clarity.
Voice Control:
- Change seed in IndexTTSNode for different voice variations.
Debugging:
- Avoid special characters in Chinese text to prevent garbled speech.
- Pre-clean noisy reference audio for better cloning.

Hand Repair Workflow: Boosting AI Image Quality with Low GPU Memory

Achieve Photorealistic Edits with the Flux+ICEdit Workflow: A Comprehensive Overview

Recommend

Transforming Line Art into 3D-Style Renders: A Deep Dive into ControlNet and Dual CLIP Encoding

Unlock Stunning Art: Transform line art into vibrant illustrations & 3D-style renders with ControlNet-guided generation & super-resolution. Learn how to use this AI workflow for breathtaking results.

Unlock Liquid Magic: Advanced I2V Workflow for Stunning Visual Effects

Generate Stunning Liquid Collision Videos with I2V Workflow! Discover how to combine WanVideo's custom models with GIMM-VFI for breathtaking effects. Learn more and start creating now!

Master Local Edits & Style Transfers with This Cutting-Edge Workflow

Unlock AI-powered image editing: Local inpainting, style transfer & auto-upscaling with ICEdit, Flux, and ESRGAN models. Try now and transform your images!

Revive Your Videos: AI-Driven Frame-Level Restoration and Enhancement

Unlock AI-powered video restoration! Discover how to repair blurry videos with frame-level enhancement and style migration using cutting-edge models like Wan2_1-T2V-1_3B_bf16. Learn how to install and utilize these models for stunning video re-rendering and high-definition restoration.

Unlock Advanced Lighting Optimization: A Step-by-Step Workflow for Stunning Images

Unlock Advanced Lighting Optimization for Photo Editing and Digital Art. Discover a powerful workflow featuring Xiaohongshu Cinematic Model, IC-Light, and more. Boost your creative results now!

Summary

Unlock Natural Speech Conversion with Index-TTS: Clone Voices, Enhance Audio & More!

Chapter

workflow:

CustomNodes:

TimbreAudioLoader AudioCleanup...