workflow

Purpose: Transforms input videos into stylized animations using Wan2.1 model with dual control via line art (AnimeLineArt) and depth maps (DepthAnything).
Key Tech: Combines ControlNet, T5 text encoding, and frame interpolation for dynamic content.

2. Core Models

Model Name	Function
Wan2.1-Fun-Control-14B	Main model for video generation (FP8 optimized).
AnimeLineArtPreprocessor	Extracts line art from input video for style control.
DepthAnythingPreprocessor	Generates depth maps for spatial consistency.
Florence2-Flux-Large	Auto-generates captions for video frames.

3. Key Nodes & Installation

Node Name	Function	Installation
WanVideoWrapper	Core nodes for video generation (model loading, sampling, encoding).	GitHub: `ComfyUI-WanVideoWrapper`
ControlNet Aux	Preprocessors for line art and depth maps.	ComfyUI Manager: `comfyui-controlnet-aux`
Video Helper Suite	Video loading/combining tools.	ComfyUI Manager: `comfyui-videohelpersuite`
Florence2	Image captioning.	GitHub: `comfyui-florence2`

Required Models:

Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors (Download)
umt5-xxl-enc-bf16.safetensors (T5 encoder).

4. Workflow Structure

Input Group (上传视频及参考图):
- Inputs: Raw video (VHS_LoadVideo), reference image (LoadImage).
- Process:
  - Frame extraction → Line art + depth map generation.
  - Caption generation via Florence2Run.
- Outputs: Preprocessed images + text prompts.
Model Loading (wan模型):
- Loads Wan2.1, T5 encoder, VAE, and configures optimizations (TorchCompile, BlockSwap).
Generation Group (采样生成):
- Inputs: Preprocessed images, text prompts, control args.
- Process:
  - Text encoding (WanVideoTextEncode) → Image encoding (WanVideoImageToVideoEncode) → Sampling (WanVideoSampler).
- Outputs: Latent video representation.
Output Group:
- Decodes latent to images (WanVideoDecode) → Combines video (VHS_VideoCombine).

5. Inputs & Outputs

Inputs:
- Video (MP4), reference image (PNG).
- Resolution: 768x768 (adjusted via ImageResizeKJ).
- Prompts: Auto-generated (Florence2) or manual (example includes positive/negative prompts).
Output:
- Stylized video (H.264 MP4, 16fps).

6. Notes

VRAM: Minimum 16GB (recommended 24GB+ due to Wan2.1 size).
Common Errors:
- Frame limit exceeded: Adjust frame_load_cap (currently 81 frames).
- Line art failure: Ensure input video has motion.
Optimization:
- Enable fp8 mode for lower VRAM usage.
- Tweak BlockSwap for memory management.

"Wan2.1 Multiverse Workflow: Generate Stunning Cooking Cat Videos"

Unlock 360-Degree Product Animation with AI-Powered Video Generation

Recommend

Transforming Line Art into 3D-Style Renders: A Deep Dive into ControlNet and Dual CLIP Encoding

Unlock Stunning Art: Transform line art into vibrant illustrations & 3D-style renders with ControlNet-guided generation & super-resolution. Learn how to use this AI workflow for breathtaking results.

Unlock Liquid Magic: Advanced I2V Workflow for Stunning Visual Effects

Generate Stunning Liquid Collision Videos with I2V Workflow! Discover how to combine WanVideo's custom models with GIMM-VFI for breathtaking effects. Learn more and start creating now!

Master Local Edits & Style Transfers with This Cutting-Edge Workflow

Unlock AI-powered image editing: Local inpainting, style transfer & auto-upscaling with ICEdit, Flux, and ESRGAN models. Try now and transform your images!

Unlock Spring Vitality: Transforming Text into Stunning 3D Art

Unlock stunning spring-themed typography with our "Spring Vitality" workflow! Transform black-and-white or 3D text images into artistic masterpieces with ease. Discover how to create captivating e-commerce posters and branding visuals automatically.

Mastering the Art of Chinese Illustrations with Advanced CLIP Encoders

Unlock stunning Eastern-style illustrations with this Flux workflow, featuring dual CLIP encoders, high-res output, and style enhancement via Lora. Discover how to generate breathtaking art with this advanced guide.

Summary

Unlock the power of video stylization with our workflow! Transform input videos into stunning animations using Wan2.1 model, AnimeLineArt, and DepthAnything. Discover how to harness ControlNet, T5 text encoding, and frame interpolation for dynamic content. Learn more and get started now!

Chapter

workflow:

CustomNodes:

WanVideoEnhanceAVideo WanVideo...