From Images to Videos: A Deep Dive into the Wan2.1-I2V Workflow

CN
ComfyUI.org
2025-04-01 14:20:43

1. Workflow Overview

m8yl4m6u2spynalas0730227e81229577c45db24e1bd38380080f3d5c376c96a73fea21b1069e3d2a02.gif

This workflow utilizes Alibaba's Wan2.1 model to generate videos from static images (I2V). Key features:

  • Extracts image features via CLIP vision encoder

  • Processes multilingual prompts with T5 text encoder

  • Generates video latent using 14B-parameter Wan2.1-I2V model

  • Outputs animated WEBP/MP4 files


2. Core Models

Model Name

Function

File Source

Wan2.1-I2V-14B

Main video generator (480P)

Wan2_1-I2V-14B-480P_fp8_e4m3fn.safetensors

UMT5-XXL Text Encoder

Handles multilingual prompts

umt5-xxl-enc-fp8_e4m3fn.safetensors

OpenCLIP Vision Encoder

Extracts image semantics

open-clip-xlm-roberta-large-vit-huge-14_visual_fp16.safetensors


3. Key Nodes

Node Name

Function

Installation

Dependencies

WanVideoSampler

Controls video sampling (frames/CFG)

Requires WanVideo plugin

Main model + VAE

WanVideoImageClipEncode

Encodes input image to latent

Same as above

CLIP vision model

VHS_VideoCombine

Combines frames (supports audio)

Install ComfyUI-VideoHelperSuite

FFmpeg required


4. Workflow Structure

  • Group 1: Input Processing

    • LoadImage: Loads input image (e.g., 576x1024)

    • WanVideoTextEncode: Processes prompts (e.g., "A smiling ancient beauty")

  • Group 2: Model Loading

    • LoadWanVideoT5TextEncoder: Loads T5 encoder

    • WanVideoModelLoader: Loads 14B video model

  • Group 3: Video Generation

    • WanVideoSampler: Generates latent (30 frames, CFG=6)

    • WanVideoDecode: Decodes to image sequence via VAE


5. Inputs & Outputs

  • Required Inputs:

    • Image file (PNG/JPG)

    • Positive prompt (e.g., style description)

    • Negative prompt (e.g., "low quality, static")

  • Outputs:

    • Animated WEBP (default) or MP4

    • Resolution: 272x272 (adjustable)


6. Notes

⚠️ Troubleshooting:

  1. VRAM: 14B model requires ≥16GB GPU, enable bf16 precision

  2. Plugin: Manual install required:

    git clone https://github.com/AI-ModelScope/comfyui-wanvideo-plugin
  3. Models: Place all .safetensors in models/wanvideo/