Create Adorable Cat Videos with AI: A Low-VRAM Workflow

CN
ComfyUI.org
2025-04-18 11:10:50

1. Workflow Overview

m9moua7ecot8uaz0hwwf9b0fae954779bc1629607add2bb40472f16eefce2b6a4d3fa8be5722da212c2.png

This workflow generates short videos of cute cats from static images (image-to-video). It includes super-resolution, frame interpolation, and video enhancement, outputting high-quality MP4 files. Designed for low VRAM usage (via GGUF quantized models) and fast local processing.

2. Core Models

Model Name

Function

wan2.1-i2v-14b-480p-Q6_K.gguf

Quantized image-to-video model (low VRAM, ~6GB).

umt5_xxl_fp8_e4m3fn_scaled

Multilingual text encoder for conditioning.

4x-ClearRealityV1.pth

Super-resolution model for upscaling.

rife49.pth

Frame interpolation model (RIFE) for smoother motion.

3. Key Nodes

3.1 Required Custom Nodes

  • ComfyUI-GGUF: Loads quantized models (install via ComfyUI Manager).

  • ComfyUI-Frame-Interpolation: Frame interpolation plugin (manual GitHub install).

  • ComfyUI-VideoHelperSuite: Video synthesis (MP4 export).

3.2 Dependencies

  • Model Files:

    • Download wan2.1-i2v-14b-480p-Q6_K.gguf to models/gguf.

    • Place 4x-ClearRealityV1.pth and rife49.pth in models/upscale_models and models/frame_interpolation.

4. Workflow Structure

Group Name

Inputs

Outputs

Logic

Text Encoding

Positive/negative prompts

Conditioning vectors

Controls video style (e.g., "cute cat").

Image Preprocess

Input image (512x768)

Scaled image

Adapts to video resolution.

Image-to-Video

Image + conditioning + VAE

Low-res video latent

Generates initial video with Wan model.

Video Enhance

Initial latent

High-res latent

Upscaling + frame interpolation (30fps).

Video Export

Enhanced image sequence

Final MP4

Adds metadata and saves.

5. Inputs & Outputs

  • Inputs:

    • Required: Cat image (512x768 recommended), positive prompt (e.g., "sparkling eyes").

    • Optional: Negative prompts (default filters low quality), seed value.

  • Output: MP4 video (720p, 30fps, H.264 encoded).

6. Notes

  1. VRAM Optimization: GGUF models reduce VRAM to ~6GB but slow down generation.

  2. Frame Rate: Adjust RIFE VFI node’s interpolation frames (default: 10) for performance trade-offs.

  3. Debugging: Missing CLIP model? Check umt5_xxl_fp8_e4m3fn_scaled.safetensors in models/clip.