Unlock Anime-Style Video Magic: A Step-by-Step WAN2.1 Workflow Guide

CN
ComfyUI.org
2025-03-27 12:24:49

1. Workflow Overview

m8rbseg62jfaentlpjt93dbf108f0ebe5a132c3f339d783763ddd6b8b95d58b4ff6236d50a70a21b167.png

This is an anime-style video generation workflow based on WAN2.1 model (DaKai optimized), featuring:

  • Convert input video (e.g., dance.mp4) to anime style

  • Dynamic prompts for character details (19yo Chinese schoolgirl dancing)

  • HunyuanLoom technology for motion coherence

  • Outputs 16fps MP4 video (H.264 encoded)

2. Core Models

Model Name

Description

wan2.1_t2v_1.3B_fp16

Main video generation model (1.3B params)

umt5_xxl_fp16

Multilingual CLIP text encoder

wan_2.1_1.3b_vae

Lightweight VAE for color accuracy

3. Key Components

Special Nodes:

  • VHS_LoadVideo: Frame extraction (skips 120 frames, keeps 81)

  • HYFlowEditGuiderCFG: Dynamic CFG guidance (CFG=7.5)

  • SamplerCustomAdvanced: Advanced sampler (16 steps, simple scheduler)

  • VAEDecodeTiled: Tiled decoding (512x512 tiles, 64px overlap)

Installation:

  1. Video Helper Suite: Install via ComfyUI Manager

  2. HunyuanLoom: Manual install from GitHub

  3. WAN2.1 Models: Download separately to models folder

4. Workflow Structure

Group 1: Load Models

  • Load UNET/CLIP/VAE (wan2.1_t2v_1.3B_fp16 + umt5_xxl_fp16)

  • Apply ApplyTeaCachePatch (strength=0.1) for acceleration

Group 2: Prompts

  • Positive Prompt: Detailed character/scene description (amber eyes, JK uniform, beach sunset)

  • Negative Prompt: 60+ filters for realism/low-quality

  • FluxGuidance boosts prompt weights

Group 3: Video Input

  • Resize input to 832x480 (nearest-exact)

  • Tiled VAE encoding

Group 4: Sampling & Output

  • Generate latent with HYFlowEditSampler (seed=123478)

  • Tiled decode + video render (CRF=19, yuv420p)

5. Inputs & Outputs

Inputs:

  • Video file (default: dance.mp4)

  • Pre-configured prompts

  • Frame rate (16fps)

Outputs:

  • MP4 video (e.g., hyloom_00003.mp4)

  • Optional intermediate frames

6. Notes

  • VRAM: 16GB+ recommended (1.3B model is VRAM-heavy)

  • Video Specs: Input ≥1280x720, ≤30sec duration

  • Motion Tuning: Adjust motion_coherence in HYFlowEditGuiderCFG if flickering occurs

  • Common Error: Missing umt5_xxl_fp16 breaks CLIP encoding