Unlock Advanced Video Depth Control with Wan Model-Based Workflow
1. Workflow Overview

This is a Wan model-based video depth control workflow specialized for video-to-video conversion. Key features:
Depth map extraction from video frames
Text-guided video stylization
Two-stage sampling pipeline
Automatic multilingual prompt translation
Core Models:
Wan 2.1 T2V 1.3B: Video-optimized base model
DepthAnythingV2: Depth preprocessor
Florence-2-base: For auto captioning
Wan Control LoRA: Depth adapter
2. Node Breakdown
Critical Components:
VHS_LoadVideo
Function: Load input video and extract frames
Requires:
comfyui-videohelpersuite
Params: 16fps, 480x720 resolution
AIO_Preprocessor
Function: Depth extraction using DepthAnythingV2
Install:
comfyui_controlnet_aux
extensionOutput: 512x512 normalized depth map
SamplerCustom (Dual-stage)
Process: 10-step high sigma + 15-step low sigma
Uses: Euler sampler
Special Dependencies:
wan_2.1_vae.safetensors: From Wan model hub
umt5_xxl_fp8: Multilingual text encoder
3. Workflow Structure
Group Logic:
Video Input Group:
Nodes: VHS_LoadVideo β ImageResizeKJ
Function: Frame loading & normalization
Depth Processing:
Nodes: AIO_Preprocessor β ImageScale
Output: Standardized depth maps
Generation Control:
Contains: UNETLoader + LoRA loader + TeaCache
Key: 0.8 strength depth LoRA
Two-Stage Sampling:
SplitSigmas β Dual SamplerCustom
4. Inputs & Outputs
Parameters:
Required: Input video (e.g. "θͺε¨εζη€Ίθ―2.mp4")
Optional: Positive prompts (auto-translated)
Advanced: Depth control strength (0.08)
Output:
MP4 video (16fps, H.264)
Frame previews
Translated prompts
5. Notes
Hardware: Minimum 12GB VRAM
Must install: VideoHelperSuite + ControlNet-Aux
Model paths: All Wan models in
wan/
subfolderCommon issue: Frame rate mismatch causes audio sync problems
Tuning: Lower CRF (current 19) for better quality