Unlock Smooth Transitions: Wan2.1 Start/End Frame Video Stabilization Workflow
1. Workflow Overview

This workflow, named "Wan2.1 Start/End Frame Video Stabilization (Industry Standard)", generates smooth transitional animations from start/end frames with:
Frame interpolation between input images
Video stabilization via
Wan2.1
model4x super-resolution (
4xFaceUpSharpDAT
)Frame rate boosting (
RIFE VFI
)
2. Core Models
Model Name | Function | Source/Installation |
---|---|---|
Wan2.1-Fun-InP-14B | Video generation & interpolation | Manual download to |
4xFaceUpSharpDAT | Face super-resolution | Install via |
RIFE VFI (rife47.pth) | Frame interpolation | Manual download to |
Wan2_1_VAE_bf16 | Video VAE encoder | Manual download to |
3. Key Nodes
Node Name | Function | Installation |
---|---|---|
WanVideoModelLoader | Loads Wan2.1 model | Built-in (requires manual model download) |
WanVideoImageToVideoEncode | Encodes start/end frames | Built-in |
RIFE VFI | 10x frame interpolation | Requires |
ImageUpscaleWithModel | Super-resolution upscaling | Built-in |
VHS_VideoCombine | Video rendering & export | Requires |
4. Workflow Structure
Input & Preprocessing
Input: Start frame (
output (6).png
) and end frame (output (7).png
)Processing:
Resize images (default: 480x768)
Extract features via
WanVideoClipVisionEncode
Video Generation
Load
Wan2.1
model and VAEGenerate intermediate frames (total: 81 frames ≈5s @16fps) using
WanVideoSampler
Post-Processing
Upscaling: 4x super-resolution with
4xFaceUpSharpDAT
Interpolation: Boost to 32fps via
RIFE VFI
Export: MP4 video (H.264, CRF=19)
5. Inputs & Outputs
Input Parameters:
Required: Start/end frame images (PNG/JPG)
Optional: Total frames (default:81), resolution (recommended ≤768px), prompts (e.g., "cartoon girl turning")
Output: Video file (
AnimateDiff_*.mp4
inComfyUI/output
)
6. Notes
Dependencies:
Manual download required for
Wan2.1
andRIFE
models.Super-resolution models via
ComfyUI Manager
.
Hardware:
Recommended: ≥12GB VRAM (NVIDIA GPU).
High frame counts (e.g., 81) may take 10-30 minutes.
Troubleshooting:
Reduce resolution/frame rate if errors occur.
Enable super-resolution or adjust prompts for facial artifacts.