Unlock the Power of Text-to-Video Generation with Aliyun's Wan2.1 Model
1. Workflow Overview

This workflow utilizes Aliyun's Wan2.1 model for Text-to-Video (T2V) generation. It integrates text encoding, video diffusion, and VAE decoding to produce dynamic video content. Key features:
Supports Chinese prompts (e.g., "滑雪的男人" - "a man skiing")
Configurable frame rate (default: 16fps) and resolution (480x768)
Includes negative prompts for quality filtering
2. Core Models
Model Name | Function | Installation |
|---|---|---|
Wan2.1-T2V-1.3B | Video diffusion backbone | Manual download ( |
umt5-xxl-enc | Chinese text encoder | Place in |
Wan2.1_VAE | Latent space decoder | Manual download |
3. Key Nodes
LoadWanVideoT5TextEncoder
Loads the Chinese text encoder (umt5-xxl-enc). Usebf16precision to save VRAM.WanVideoTextEncode
Processes positive/negative prompts. Example negative prompts filter low-quality content.WanVideoModelLoader
Loads the main video model with options forfp32/fp16and VRAM optimization.WanVideoSampler
Core sampler parameters:steps: 10 (lower for faster video generation)cfg_scale: 6 (lower for creative freedom)sampler: dpm++
VHS_VideoCombine
Combines frames into MP4 video with configurable:Frame rate (16fps)
Output format (H.264, CRF=19)
Filename prefix (
WanVideo2_1_T2V)
4. Workflow Structure
Group 1: Text Processing
Input: Chinese prompt
Output: Text embeddings
Key nodes:
LoadWanVideoT5TextEncoder→WanVideoTextEncode
Group 2: Video Generation
Input: Text embeds + empty image embeds (480x768)
Output: Latent video data
Key nodes:
WanVideoSampler
Group 3: Video Export
Input: Decoded image sequence
Output: MP4 file
Key nodes:
WanVideoDecode→VHS_VideoCombine
5. I/O Specifications
Input Parameters:
Resolution: 480x768 (set in
WanVideoEmptyEmbeds)Seed: Fixed/Random (example:
1057359483639287)Prompts: Natural Chinese language (avoid complex syntax)
Output:
MP4 video (saved to ComfyUI output folder)
Includes generation metadata
6. Notes
⚠️ VRAM Requirements
Minimum 12GB (16GB recommended)
Enable
offload_devicefor optimization
⚠️ Model Installation
Download Wan2.1 models manually from official sources
Text encoder path:
models/wan_t5/umt5-xxl-enc-bf16.safetensors
⚠️ Dependencies
Requires
ComfyUI-WanVideoWrapper&VideoHelperSuiteInstall via ComfyUI Manager