Create Stunning Animated Videos with Ease: A Flux.1 and WanVideo Tutorial
Workflow Overview

This workflow integrates the Flux.1 model and WanVideo plugin to generate an image from text and convert it into a short video. Its main functions are:
Generating a high-quality image using Flux.1 based on a text prompt.
Converting the image into a short animation video using WanVideo, keeping the scene stable while animating specific elements (e.g., a person).
Outputting a final MP4 video.
Core Models
Flux.1 (flux1-dev.sft): Efficient diffusion model for high-detail image generation.
LoRA (梦幻粘土世界_v1.0.safetensors): Fine-tunes Flux.1 for a dreamy clay style.
CLIP (DualCLIPLoader): Loads two CLIP models (runwayml and sd3/clip_l) for prompt processing.
VAE (ae.sft): Encodes/decodes images for Flux.1.
WanVideo Model (wan2.1_i2v_480p_14B_bf16_Comfy-Org.safetensors): Image-to-video generation model.
WanVideo T5 (umt5-xxl-enc-bf16.safetensors): Advanced text encoder for animation prompts.
WanVideo CLIP (open-clip-xlm-roberta-large-vit-huge-14_fp16.safetensors): Processes image embeddings.
WanVideo VAE (Wan2_1_VAE_bf16.safetensors): Encodes/decodes for video generation.
Component Explanation
UNETLoader: Loads the Flux.1 model.
Installation: Default ComfyUI node.
DualCLIPLoader: Loads dual CLIP models.
Installation: Default ComfyUI node.
VAELoader: Loads Flux.1’s VAE.
Installation: Default ComfyUI node.
CLIPTextEncode: Encodes positive prompts.
Installation: Default ComfyUI node.
FluxGuidance: Adjusts Flux.1 generation guidance strength.
Installation: Default ComfyUI node.
BasicGuider: Provides sampling guidance.
Installation: Default ComfyUI node.
SamplerCustomAdvanced: Advanced sampler for latent image generation.
Installation: Default ComfyUI node.
VAEDecode: Decodes latent images.
Installation: Default ComfyUI node.
LoraLoader: Loads LoRA model.
Installation: Default ComfyUI node.
WanVideoModelLoader: Loads WanVideo model.
Installation: Install via ComfyUI Manager (WanVideo plugin); model from official source.
LoadWanVideoT5TextEncoder: Loads T5 text encoder.
Installation: WanVideo plugin.
LoadWanVideoClipTextEncoder: Loads WanVideo CLIP.
Installation: WanVideo plugin.
WanVideoVAELoader: Loads WanVideo VAE.
Installation: WanVideo plugin.
WanVideoTextEncode: Encodes animation prompts.
Installation: WanVideo plugin.
WanVideoImageClipEncode: Encodes input image.
Installation: WanVideo plugin.
WanVideoSampler: Generates video latent space.
Installation: WanVideo plugin.
WanVideoDecode: Decodes video frames.
Installation: WanVideo plugin.
VHS_VideoCombine: Combines frames into MP4 video.
Installation: Install via ComfyUI Manager (VideoHelperSuite plugin).
Workflow Structure
Text-to-Image Base Group
Nodes: UNETLoader → LoraLoader → CLIPTextEncode → FluxGuidance → BasicGuider → SamplerCustomAdvanced → VAEDecode
Role: Generates a dreamy clay-style image.
Inputs: Prompt (e.g., “A miniature coffee factory…”), guidance strength (3.5), steps (20).
Outputs: A 1024x1024 image.
Wan Image-to-Video Group
Nodes: WanVideoModelLoader → WanVideoTextEncode → WanVideoImageClipEncode → WanVideoSampler → WanVideoDecode → VHS_VideoCombine
Role: Converts the image into an animation video.
Inputs: Image, animation prompt (e.g., “change this photo into animation…”), frames (10), steps (6).
Outputs: MP4 short video.
Inputs and Outputs
Inputs:
Positive prompt: “A miniature coffee factory where tiny baristas are brewing espresso…”.
Animation prompt: “change this photo into animation, keep the whole image and camera steady…”.
Resolution: 1024x1024 (image), 272x272 (video).
Seed: Fixed or random.
Outputs: A 10-frame MP4 video at 16 fps.
Notes and Considerations
Errors: Ensure WanVideo model paths are correct to avoid errors.
Performance: Use bf16 or fp8 precision and offload_device to reduce memory usage.
Compatibility: WanVideo and VHS plugins require the latest ComfyUI version.
Resources: Recommend 16GB GPU memory; keep frames and steps low for stability.