workflow

FantasyTalkingWav2VecEmbeds
- Converts audio to lip movement parameters
- Key params: 81 frames, audio_cfg_scale=23
WanVideoSampler
- Advanced sampler with UniPC scheduler
- Config: 30 steps, CFG=5
WanVideoImageToVideoEncode
- Temporal image encoder
- Default res: 832x480 (16:9)
VHS_VideoCombine
- Requires VideoHelperSuite
- Outputs both H.264 MP4 and GIF

Dependencies:

Must install ComfyUI-WanVideoWrapper
~35GB model downloads required

3. Workflow Structure

Processing Stages:

Input Preparation
- Load character image (512x768) → resize via KJNodes
- Load WAV audio
Feature Extraction
- CLIP vision encoding (vit_h)
- T5 text encoding (Chinese umt5-xxl)
- wav2vec2 audio processing
Video Generation
- TeaCache for VRAM optimization
- FP8 mixed precision acceleration
Output
- 23fps MP4 + looped GIF

4. I/O Specification

Inputs:

Source image: ComfyUI_temp_nupri_00001_.png
Audio: [jok老师]说得好像您带我以来我考好过几次一样.wav

Prompt:

Positive: "A woman talking to camera"  
Negative: "Overexposed, static, blurry details..."

Outputs:

MP4: WanVideoWrapper_I2V_FantasyTalking_[timestamp].mp4
GIF: Same prefix .gif

5. Critical Notes

Hardware Requirements
- Min VRAM: 12GB (FP16 mode)
- Recommended: RTX 3090/4090

Troubleshooting

For CUDA OOM:

Reduce block_size in WanVideoTorchCompileSettings (current 128)

Lip sync issues: Adjust audio_cfg_scale

Model Paths
- Wan models: ComfyUI/models/wanvideo/
- Audio models auto-download to: ComfyUI/models/wav2vec2/

From Brushstrokes to Pixels: A Deep Dive into Stable Diffusion's Graffiti Capabilities

Mastering the Art of Chinese Illustrations with Advanced CLIP Encoders

Recommend

comfyui Windows Installation with Conda and venv Tutorial

Learn how to install ComfyUI in isolated Python environments using Conda or venv for clean, conflict-free dependency management. Start now!

MimicMotion Explained: How to Use Diffusion Models for Animation in ComfyUI

Generate animated videos with MimicMotion: Transform reference images and pose sequences into seamless MP4 animations. Explore the workflow now!

Unlock Next-Level Video Generation with ComfyUI's LTX-Video 0.9.5 Integration

Unlock ComfyUI's full potential with LTX-Video 0.9.5! Discover improved quality, key frame control, and commercial licensing. Update now and elevate your video generation experience!

Unlock Stunning Images: A Step-by-Step Guide to Flux.1-Based Text-to-Image Generation

Unlock high-quality image generation with Flux.1! Discover a Text-to-Image workflow integrating LoRA enhancement and multilingual support, producing stunning 1024x1280 images. Learn how to harness Flux.1-dev, T5-XXL, CLIP-L, and VAE for artistic and professional photography-style applications.

Discover the Ultimate Eastern Art Creation Workflow with AI

Unlock Eastern Pixar-style art creation with this workflow! Generate high-quality images with Flux.1 and Lora models. Download now and enhance your digital illustrations!

Summary

Unlock Pro-Level Lip-Sync Videos: Discover the ultimate workflow combining WanVideo and FantasyTalking tech for precise audio-driven lip sync, multimodal conditioning, and dual output formats. Learn how to create stunning videos now!

Chapter

workflow:

CustomNodes:

WanVideoTorchCompileSettings W...