Unlock AI-Generated Videos with Wan2.1 Model Inference: T2V & I2V Made Easy

CN
ComfyUI.org
2025-05-30 07:53:21

1. Workflow Overview

mbai90asql7qio3redd1a7508139cbdde3ed452e3e8ce3f860f3dd328591969935377de8eb36958f83.gif

This workflow, titled "Wan2.1 Model Inference (T2V & I2V)", enables:

  • Text-to-Video (T2V): Generates videos from text prompts.

  • Image-to-Video (I2V): Converts input images to animated sequences (e.g., anime-style transitions).

  • Optimizations: Includes VRAM management, inference acceleration, and resolution control.

2. Core Models

  • Wan2.1-I2V-14B: Primary video generation model (supports dual input: text/image).

  • umt5-xxl-enc: Text encoder for prompt processing.

  • open-clip-xlm-roberta: Encodes input images for I2V mode.

3. Key Nodes

Input & Encoding

  • LoadImage: Uploads input images (I2V mode).

  • WanVideoImageClipEncode: Encodes images into embeddings.

  • WanVideoTextEncode: Processes text prompts (T2V mode).

Model & Inference

  • WanVideoModelLoader: Loads Wan2.1 model (supports LoRA adapters).

  • WanVideoSampler: Generates videos (steps=25, CFG=6, etc.).

Optimizations

  • WanVideoBlockSwap: VRAM optimization (model chunking).

  • WanVideoTeaCache: Speeds up inference (caches intermediate results).

  • WanVideoSLG: Dynamic generation strategy (e.g., staged sampling).

Post-Processing

  • WanVideoDecode: Decodes latent frames to images.

  • VHS_VideoCombine: Renders final video (30FPS MP4).

4. Workflow Structure (Groups)

  1. Image Input Zone

    • Input: Images (e.g., 透明.png), recommended size ≤480x480.

    • Key Nodes: LoadImage, WanVideoImageClipEncode.

  2. Loader Zone

    • Loads models/encoders:

      • WanVideoVAELoader (VAE).

      • LoadWanVideoT5TextEncoder (text encoder).

  3. Workspace (Core Logic)

    • Text/image encoding → Model inference → Optimizations.

    • Key Nodes: WanVideoSampler, WanVideoSLG.

  4. Post-Processing Zone

    • Video decoding & synthesis: WanVideoDecode, VHS_VideoCombine.

5. Inputs & Outputs

  • Inputs:

    • Image (I2V) or text prompt (T2V).

    • Resolution: Default 832x480 (set in WanVideoImageClipEncode).

  • Output:

    • Video file (MP4, 30FPS), e.g., WanVideo2_1_T2V_00256.mp4.

6. Notes & Tips

  • VRAM: 14B model requires 16GB+ GPU; enable BlockSwap and TeaCache.

  • Image Size: Resize large images with ImageResizeKJ to avoid OOM.

  • LoRA: Optional adapters like 馨染_Wan2.1 for style control.

  • Parameter Tips:

    • SLG: For 14B, use blocks=16-20, strat_percent=0.1-0.15.

    • TeaCache: For 14B, set rel_l1_thresh=0.2, mode=speed.

Recommend