Unlock Stunning Video Generation with Style Control: A Comprehensive Workflow Guide

CN
ComfyUI.org
2025-04-04 08:22:07

1. Workflow Overview

m92imo26io3d8kqvc3752c2e3c58a9587e48b65a45d0c85c310ac340b45695409450773f81da64f6a7.gif

This workflow is designed for video generation with style control, leveraging Alibaba's InP (Intelligent Processing) model to enhance details. Key features:

  • Start/End Frame Driven: Generates intermediate frames between two input images.

  • FunControl: Dynamic style interpolation via WanVideoBlockSwap.

  • Multimodal Support: Integrates CLIP vision encoding, T5 text encoding, and VAE decoding.

Core Models:

  • Wan2.1-Fun-InP-14B: 14B-parameter video model with FP8 quantization (VRAM-optimized).

  • umt5-xxl-enc: Multilingual T5 text encoder for complex prompts.

  • OpenCLIP-ViT-H: Vision encoder for image feature extraction.


2. Key Components

Critical Nodes:

  1. WanVideoModelLoader:

    • Function: Loads the main model (Wan2.1-Fun-InP-14B_fp8_e4m3fn.safetensors).

    • Installation: Manually download and place in ComfyUI/models/wan_video/.

    • Dependency: FP8 requires NVIDIA Ampere/Ada GPUs.

  2. WanVideoClipVisionEncode:

    • Function: Encodes start/end frames using OpenCLIP.

    • Model: open-clip-xlm-roberta-large-vit-huge-14_fp16.safetensors (from HuggingFace).

  3. WanVideoSampler:

    • Function: Controls sampling steps (30), CFG scale (6), and motion intensity (slg_args).

  4. VHS_VideoCombine:

    • Function: Renders frame sequences to MP4/GIF (H.264 supported).

    • Installation: Requires ComfyUI-VideoHelperSuite plugin.


3. Workflow Groups

Group Logic:

  • Group 1: Frame Processing

    • Input: Two images (loaded via LoadImage).

    • Process: Resize to 640x480 (ImageResizeKJ), add labels (AddLabel).

  • Group 2: Video Generation

    • Input: Encoded image features + text prompts.

    • Output: Latent sequence (decoded by WanVideoDecode).

  • Group 3: Post-Processing

    • Output: Final video (MP4/GIF) + previews (with metadata).


4. Inputs & Outputs

Input Parameters:

  • Images: Start/end frames (recommended ≥480p).

  • Text Prompt: English descriptions (e.g., "digital wireframe video with lightsaber").

  • Performance Args:

    • teacache_args: VRAM optimization (threshold 0.3).

    • experimental_args: Interpolation modes (2,3).

Output:

  • Video Files: MP4 (H.264) and GIF formats.

  • Previews: Labeled frame comparisons.


5. Notes

  • VRAM: Minimum 16GB (24GB for FP8).

  • Compatibility:

    • Switch to 1.3B model if VRAM insufficient (edit WanVideoModelLoader).

    • Missing models trigger console download prompts.

  • Debugging:

    • Frame size mismatch crashes ImageConcatMulti.

    • Avoid non-English prompts (T5 trained on English).