Create Stunning Animated Videos with Ease: A Flux.1 and WanVideo Tutorial

CN
ComfyUI.org
2025-03-17 10:59:19

Workflow Overview

m8cybjy60q531g812qtdd4771f5bf6b6eac5b6ba34ed18775a33ea0a51ac31dc70f92afd6a1b26eb9bf5.gif

This workflow integrates the Flux.1 model and WanVideo plugin to generate an image from text and convert it into a short video. Its main functions are:

  1. Generating a high-quality image using Flux.1 based on a text prompt.

  2. Converting the image into a short animation video using WanVideo, keeping the scene stable while animating specific elements (e.g., a person).

  3. Outputting a final MP4 video.

Core Models

  • Flux.1 (flux1-dev.sft): Efficient diffusion model for high-detail image generation.

  • LoRA (梦幻粘土世界_v1.0.safetensors): Fine-tunes Flux.1 for a dreamy clay style.

  • CLIP (DualCLIPLoader): Loads two CLIP models (runwayml and sd3/clip_l) for prompt processing.

  • VAE (ae.sft): Encodes/decodes images for Flux.1.

  • WanVideo Model (wan2.1_i2v_480p_14B_bf16_Comfy-Org.safetensors): Image-to-video generation model.

  • WanVideo T5 (umt5-xxl-enc-bf16.safetensors): Advanced text encoder for animation prompts.

  • WanVideo CLIP (open-clip-xlm-roberta-large-vit-huge-14_fp16.safetensors): Processes image embeddings.

  • WanVideo VAE (Wan2_1_VAE_bf16.safetensors): Encodes/decodes for video generation.

Component Explanation

  1. UNETLoader: Loads the Flux.1 model.

    • Installation: Default ComfyUI node.

  2. DualCLIPLoader: Loads dual CLIP models.

    • Installation: Default ComfyUI node.

  3. VAELoader: Loads Flux.1’s VAE.

    • Installation: Default ComfyUI node.

  4. CLIPTextEncode: Encodes positive prompts.

    • Installation: Default ComfyUI node.

  5. FluxGuidance: Adjusts Flux.1 generation guidance strength.

    • Installation: Default ComfyUI node.

  6. BasicGuider: Provides sampling guidance.

    • Installation: Default ComfyUI node.

  7. SamplerCustomAdvanced: Advanced sampler for latent image generation.

    • Installation: Default ComfyUI node.

  8. VAEDecode: Decodes latent images.

    • Installation: Default ComfyUI node.

  9. LoraLoader: Loads LoRA model.

    • Installation: Default ComfyUI node.

  10. WanVideoModelLoader: Loads WanVideo model.

    • Installation: Install via ComfyUI Manager (WanVideo plugin); model from official source.

  11. LoadWanVideoT5TextEncoder: Loads T5 text encoder.

    • Installation: WanVideo plugin.

  12. LoadWanVideoClipTextEncoder: Loads WanVideo CLIP.

    • Installation: WanVideo plugin.

  13. WanVideoVAELoader: Loads WanVideo VAE.

    • Installation: WanVideo plugin.

  14. WanVideoTextEncode: Encodes animation prompts.

    • Installation: WanVideo plugin.

  15. WanVideoImageClipEncode: Encodes input image.

    • Installation: WanVideo plugin.

  16. WanVideoSampler: Generates video latent space.

    • Installation: WanVideo plugin.

  17. WanVideoDecode: Decodes video frames.

    • Installation: WanVideo plugin.

  18. VHS_VideoCombine: Combines frames into MP4 video.

    • Installation: Install via ComfyUI Manager (VideoHelperSuite plugin).

Workflow Structure

  1. Text-to-Image Base Group

    • Nodes: UNETLoader → LoraLoader → CLIPTextEncode → FluxGuidance → BasicGuider → SamplerCustomAdvanced → VAEDecode

    • Role: Generates a dreamy clay-style image.

    • Inputs: Prompt (e.g., “A miniature coffee factory…”), guidance strength (3.5), steps (20).

    • Outputs: A 1024x1024 image.

  2. Wan Image-to-Video Group

    • Nodes: WanVideoModelLoader → WanVideoTextEncode → WanVideoImageClipEncode → WanVideoSampler → WanVideoDecode → VHS_VideoCombine

    • Role: Converts the image into an animation video.

    • Inputs: Image, animation prompt (e.g., “change this photo into animation…”), frames (10), steps (6).

    • Outputs: MP4 short video.

Inputs and Outputs

  • Inputs:

    • Positive prompt: “A miniature coffee factory where tiny baristas are brewing espresso…”.

    • Animation prompt: “change this photo into animation, keep the whole image and camera steady…”.

    • Resolution: 1024x1024 (image), 272x272 (video).

    • Seed: Fixed or random.

  • Outputs: A 10-frame MP4 video at 16 fps.

Notes and Considerations

  • Errors: Ensure WanVideo model paths are correct to avoid errors.

  • Performance: Use bf16 or fp8 precision and offload_device to reduce memory usage.

  • Compatibility: WanVideo and VHS plugins require the latest ComfyUI version.

  • Resources: Recommend 16GB GPU memory; keep frames and steps low for stability.