Unlock the Power of AI-Generated Videos: A Comprehensive Workflow Guide

CN
ComfyUI.org
2025-04-11 07:31:32

1. Workflow Overview

m9cgwt7k648ecrbqi87e454134a2c9b748480b6f2db669aa38efac814596dae98d317aa2a852048e5a8.jpg
  • Purpose: Generates videos from text (T2V) or images (I2V) using Tencent's WanXiang 1.3B model, with post-processing like super-resolution and frame interpolation.

  • Core Models:

    • Wan2_1-T2V-1.3B: Text-to-video base model (1.3B params).

    • Wan2_1-I2V-14B: Image-to-video model (14B params, FP8 quantized).

    • UMT5-XXL: Multilingual text encoder (supports Chinese prompts).

    • FILM-Net: Frame interpolation model (for smoother videos).

2. Key Nodes

Node Name

Function

Installation

Dependencies

WanVideoSampler

Video sampling

Install ComfyUI-WanVideoWrapper

Requires .safetensors models

WanVideoTextEncode

Handles Chinese/English prompts

Same as above

Needs UMT5-XXL encoder

VHS_VideoCombine

Combines frames into MP4

Install VideoHelperSuite via Manager

Requires FFmpeg

FILM VFI

Frame interpolation

Install ComfyUI-Frame-Interpolation

film_net_fp32.pt model

3. Workflow Structure

  • Group 1: Text-to-Video (T2V)

    • Input: Prompts (e.g., "vibrant cartoon style"), negative prompts, seed.

    • Process: Text encoding via UMT5-XXL, video generation by Wan2_1-T2V.

    • Output: Raw video (720P).

  • Group 2: Image-to-Video (I2V)

    • Input: Reference image (e.g., ComfyUI_06397_.png), prompts.

    • Process: Uses CLIP-Vision for image features, Wan2_1-I2V for generation.

    • Output: Dynamic video (81+ frames).

  • Group 3: Post-Processing

    • Upscaling: ESRGAN_4x for higher resolution.

    • Frame Interpolation: Boosts FPS from 16 to 32 via FILM-Net.

    • Output: Final MP4 (H.264, CRF=19).

4. Input & Output

  • Key Inputs:

    • Prompts: Mix Chinese/English (e.g., "产品摄影,拉近镜头").

    • Resolution: Default 720x1280 (portrait) or 1280x720 (landscape).

    • Frames: I2V requires ≥81 frames.

  • Output:

    • Format: MP4 (H.264).

    • Path: Saved to ComfyUI/output/Hunyuan/videos/.

5. Notes

  • VRAM Requirements:

    • I2V model needs ≥16GB GPU (FP8 quantization still demands high-end cards).

    • Enable torch.compile for speed (requires CUDA 12+).

  • Common Issues:

    • Frame count <81: Adjust WanVideoBlockSwap settings.

    • OOM errors: Reduce resolution or disable bf16.

  • Optimization:

    • Use TeaCache node to save VRAM (enabled by default).

    • Include motion keywords in prompts (e.g., "zoom in").