Unlock the Power of Image Animation: Transform Static Portraits into Dynamic Videos

CN
ComfyUI.org
2025-04-23 10:19:17

1. Workflow Overview

m9ts6jd6oyjloyu4knf1bc5237b526819f4010b72ac6ff7f02d990e4804c6ffdd0ac0797f342e4942b6.gif

This "Wan Image-to-Video" workflow transforms static images into dynamic videos (e.g., making portraits blink/smile). Powered by Wan 2.1 specialized model, key features include:

  • Preserving original image details

  • Generating natural motions

  • Outputting 720P HD videos (24FPS)

2. Core Models

  • Main Model: Quantized wan2.1-i2v-14b (4bit GGUF)

  • CLIP Vision: clip_vision_h.safetensors

  • Video VAE: Custom wan_2.1_vae

3. Key Nodes

Node

Function

Installation

WanImageToVideo

Core image animation

Requires ComfyUI-Wan

VHS_VideoCombine

Frame-to-video synthesis

Video Helper Suite plugin

Dependencies:

  • Input image 1.png

  • umt5-xxl text encoder (GGUF format)

4. Pipeline Stages

Stage 1: Initialization

  • Load Wan model trio (UNet/CLIP/VAE)

  • CLIP vision encode reference image

Stage 2: Motion Control

  • Positive prompt drives actions (e.g., "slowly turns head")

  • Negative prompt filters artifacts

Stage 3: Rendering

  • uni_pc sampler (20 steps)

  • MP4 output via FFmpeg

5. I/O Specification

  • Inputs:

    • Source image (e.g., portrait)

    • Action description in English

  • Output:

    • 720P video (wan_i2v_xxxx.mp4)

6. Critical Notes

⚠️ Requirements:

  • 8GB+ VRAM (optimized with GGUF)

  • NVIDIA 30/40 series recommended

🔧 Pro Tips:

  • Adjust tile_size in VAEDecodeTiled for performance

  • Keep "static" in negative prompts for smoother motion