Transforming Static Images into Cinematic Explosions with Wan2.1

CN
ComfyUI.org
2025-05-19 08:55:26

1. Workflow Overview

mauumu9tvtq58vwwigc9e5cbefdb6b0b2c9fc798e2ab43dcc7c80e57cdddb072b9760c34a52c610d973.gif

This workflow generates high-dynamic explosion effect videos using the "Wan2.1" video generation model, transforming static images (e.g., mecha characters) into dynamic clips with explosions, fire, and debris. Key features:

  • Image-to-Video: Adds effects like explosions based on input images.

  • Multimodal Control: Combines text prompts (e.g., "intense explosion + terrified expression") with image semantics.

  • Professional Optimization: Supports tiled rendering and VRAM management for HD output.

Core Models:

  • Wan2.1-I2V-14B: Main video model (14B params, 480P output).

  • UMT5-XXL Text Encoder: Processes Chinese/English prompts.

  • Explosion LoRA: WAN2.1 ZOEY Explosion I2V_Alpha, enhances physics details.


2. Key Components

Critical Nodes:

  1. WanVideoModelLoader:

    • Loads Wan2.1 model (Wan2_1-I2V-14B-480P_fp8_e4m3fn.safetensors).

    • Supports bf16/fp8 precision and VRAM optimization.

  2. WanVideoTextEncode:

    • Input bilingual prompts (e.g., "baozha, violent explosion behind the character...").

    • Uses UMT5-XXL for text embeddings.

  3. WanVideoLoraSelect:

    • Loads explosion LoRA (default weight: 1.0).

  4. WanVideoSampler:

    • Key sampler settings:

      • Steps: 20

      • Sampler: dpm++_sde (optimal for motion).

      • Seed: 1057359483639287 (fixed).

  5. VHS_VideoCombine:

    • Final video synthesis (MP4/GIF, 16fps, CRF19).

Dependencies:

  • Install ComfyUI-WanVideoWrapper and ComfyUI-VideoHelperSuite.

  • Download models from LibLibAI to:

    • Main model: ComfyUI/models/wan_video

    • LoRA: ComfyUI/models/loras


3. Workflow Structure

  1. Model Loading:

    • Load video model, text encoder, VAE, and LoRA.

  2. Input Processing:

    • Resize input image (e.g., mecha_valkyrie.png to 832x832) → CLIP vision encode.

  3. Video Generation:

    • Fuse text/image embeddings → Sampler → Latent frames.

  4. Output:

    • Decode latent → Combine video → Save as MP4 (default: WanVideo2_1_T2V).


4. Input & Output

Input Parameters:

  • Image: Recommended resolution ≥832x832 (e.g., mecha valkyrie image).

  • Text Prompt: Must include explosion keywords (Chinese/English).

  • Seed: Random or fixed (e.g., 1057359483639287).

Output:

  • 480P MP4 video with explosion effects (saved to ComfyUI/output).

  • Example output description:

    "Violent explosion behind the character, flying debris, terrified expression, vibrant colors."


5. Notes

  • VRAM: ≥16GB required (24GB recommended for HD rendering).

  • Compatibility:

    • Only works with Wan2.1 models (not compatible with Stable Diffusion).

    • LoRAs must match Wan2.1-I2V version.

  • Troubleshooting:

    • Missing WanVideoWrapper causes node errors.

    • Adjust ImageResizeKJ if input isn't square.