From Pose to Playback: Mastering Video Generation with Tongyi Wanxiang's Fun-ControlNet
1. Workflow Overview

This workflow, titled "Tongyi Wanxiang-WAN2.1-Fun ControlNet Video Generation [Pose/Depth Control]", is designed for:
Video Generation: Creates dynamic videos from input control signals (e.g., pose/depth maps).
Style Control: Uses Fun-ControlNet for precise content control (e.g., character motion).
Post-Processing: Includes video upscaling, frame interpolation, and final rendering.
2. Core Models
WAN2.1-Fun-ControlNet: Main video generation model with multi-modal control.
Meta-Llama-3.1-8B: Generates captions for input images.
FILM VFI: Frame interpolation model for smoother motion.
4x_foolhardy_Remacri: Upscales video resolution.
3. Key Nodes
Video Generation
WanVideoModelLoader: Loads the WAN2.1-Fun-ControlNet model.
WanVideoSampler: Generates video frames with configurable parameters (steps, CFG scale).
WanVideoDecode: Decodes latent frames to images.
Control Signal Processing
AIO_Preprocessor: Preprocesses control maps (e.g., pose/depth).
WanVideoControlEmbeds: Encodes control signals.
Post-Processing
FILM VFI: Interpolates frames for smoother playback.
ImageUpscaleWithModel: Enhances video resolution.
VHS_VideoCombine: Renders final video (supports audio merging).
Utilities
Joy_caption_two: Generates text prompts from reference images.
easy cleanGpuUsed: Clears GPU memory to prevent overflow.
4. Workflow Structure (Groups)
Input Control Video Group
Input: Uploaded video or control images (e.g., pose maps).
Key Nodes:
VHS_LoadVideo
,ImageResizeKJ
(resizes input).
Fun-Control Group
Input: Control signals, prompts, model parameters.
Key Nodes:
WanVideoSampler
,WanVideoControlEmbeds
.
Reference Image Captioning Group
Input: Reference image.
Key Node:
Joy_caption_two
(generates descriptive text).
Post-Processing Group
Input: Raw generated frames.
Key Nodes:
FILM VFI
(interpolation),VHS_VideoCombine
(final render).
5. Inputs & Outputs
Input Parameters:
Control video, resolution (default: 480x832), prompts, frame limit (default: 49).
Output:
Final video (MP4), optionally upscaled and interpolated.
6. Notes & Tips
VRAM Requirement: Recommended GPU with 16GB+ VRAM (e.g., RTX 3090).
Dependencies: Install
ComfyUI-WanVideoWrapper
andComfyUI-VideoHelperSuite
manually.Common Issues:
Missing model files: Ensure
Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors
is downloaded.Resolution mismatch: Align input video and control map dimensions.