Unlock Stunning Video Generation with Style Control: A Comprehensive Workflow Guide
1. Workflow Overview

This workflow is designed for video generation with style control, leveraging Alibaba's InP (Intelligent Processing) model to enhance details. Key features:
Start/End Frame Driven: Generates intermediate frames between two input images.
FunControl: Dynamic style interpolation via
WanVideoBlockSwap
.Multimodal Support: Integrates CLIP vision encoding, T5 text encoding, and VAE decoding.
Core Models:
Wan2.1-Fun-InP-14B: 14B-parameter video model with FP8 quantization (VRAM-optimized).
umt5-xxl-enc: Multilingual T5 text encoder for complex prompts.
OpenCLIP-ViT-H: Vision encoder for image feature extraction.
2. Key Components
Critical Nodes:
WanVideoModelLoader:
Function: Loads the main model (
Wan2.1-Fun-InP-14B_fp8_e4m3fn.safetensors
).Installation: Manually download and place in
ComfyUI/models/wan_video/
.Dependency: FP8 requires NVIDIA Ampere/Ada GPUs.
WanVideoClipVisionEncode:
Function: Encodes start/end frames using OpenCLIP.
Model:
open-clip-xlm-roberta-large-vit-huge-14_fp16.safetensors
(from HuggingFace).
WanVideoSampler:
Function: Controls sampling steps (30), CFG scale (6), and motion intensity (
slg_args
).
VHS_VideoCombine:
Function: Renders frame sequences to MP4/GIF (H.264 supported).
Installation: Requires
ComfyUI-VideoHelperSuite
plugin.
3. Workflow Groups
Group Logic:
Group 1: Frame Processing
Input: Two images (loaded via
LoadImage
).Process: Resize to 640x480 (
ImageResizeKJ
), add labels (AddLabel
).
Group 2: Video Generation
Input: Encoded image features + text prompts.
Output: Latent sequence (decoded by
WanVideoDecode
).
Group 3: Post-Processing
Output: Final video (MP4/GIF) + previews (with metadata).
4. Inputs & Outputs
Input Parameters:
Images: Start/end frames (recommended ≥480p).
Text Prompt: English descriptions (e.g., "digital wireframe video with lightsaber").
Performance Args:
teacache_args
: VRAM optimization (threshold 0.3).experimental_args
: Interpolation modes (2,3
).
Output:
Video Files: MP4 (H.264) and GIF formats.
Previews: Labeled frame comparisons.
5. Notes
VRAM: Minimum 16GB (24GB for FP8).
Compatibility:
Switch to
1.3B
model if VRAM insufficient (editWanVideoModelLoader
).Missing models trigger console download prompts.
Debugging:
Frame size mismatch crashes
ImageConcatMulti
.Avoid non-English prompts (T5 trained on English).