ComfyUI E-commerce Product Animation: Boost Sales with I2V Technology
1. Workflow Overview

This workflow generates e-commerce product videos from static images (e.g., jewelry, apparel), highlighting details and wearing effects. Key features:
Image-to-Video (I2V): Converts product images into dynamic clips (e.g., subtle bracelet movement on a model's wrist).
Smart Cropping: Auto-adjusts input aspect ratio (e.g., 350x350 → 832x480) for video output.
Commercial-Grade Output: 30fps H.264 encoding (CRF19) balances quality and file size.
Core Models:
Wan2.1-I2V-14B: Main video model (480P, bf16/fp8 mixed precision).
UMT5-XXL Text Encoder: Processes Chinese product descriptions.
CLIP Vision Encoder: Analyzes image composition/color.
2. Key Components
Critical Nodes:
WanVideoModelLoader:
Loads Wan2.1 model (
Wan2_1-I2V-14B-480P_fp8_e4m3fn.safetensors
) withsdpa
attention optimization.
WanVideoTextEncode:
Input product prompts, e.g.:
Positive:
"Bracelet showcase video, highlighting craftsmanship."
Negative:
"low quality, blurry, distortion"
ImageScaleByAspectRatio V2:
Resizes input (e.g.,
847b03d5...jpg
) withletterbox
padding.
WanVideoSampler:
Settings:
Steps: 25
Sampler:
unipc
(fast convergence)Seed: Random (can fix to
773185989414318
).
VHS_VideoCombine:
Exports 30fps MP4 (default:
AnimateDiff.mp4
).
Dependencies:
Install
ComfyUI-VideoHelperSuite
.Download models to:
Main model:
ComfyUI/models/wan_video
VAE:
Wan2_1_VAE_bf16.safetensors
3. Workflow Structure
Model Loading:
Load video model, text encoder, VAE.
Input Processing (Operation Group):
Upload product image → Resize → CLIP encode.
Video Generation:
Fuse image features + text → Sampler → Latent frames.
Output:
Decode latent → MP4 synthesis (with metadata).
Key Parameters:
Resolution: Fixed 480P (832x480 landscape).
Frame Rate: 30fps (set in
VHS_VideoCombine
).
4. Input & Output
Input Parameters:
Image: Square/vertical (e.g., 1440x1920), clear product focus.
Text Prompt: Concise product highlights (Chinese preferred).
Output:
480P MP4 video (e.g., bracelet animation), saved to
ComfyUI/output
.
5. Notes
VRAM: ≥12GB required (16GB recommended).
Image Tips:
Clear subject, simple background (pre-cropped).
Avoid text/watermarks interfering with CLIP.
Troubleshooting:
Choppy video? Reduce
WanVideoSampler
steps (e.g., 20).Distortion? Check
ImageScaleByAspectRatio
mode isletterbox
.