Create Breathtaking Architectural Videos with Our Advanced Low-Memory Solution

CN
ComfyUI.org
2025-06-12 07:39:00

1. Workflow Overview

mbt2gwrd14nuio4osf4图片压缩62c3ba6b8422049a7cccc2d7e357e820409fd8f222cbb4b7556a6e3be5ca24e6.png

A low-VRAM animation generator for architectural scenes featuring:

  • Long Sequences: 60s generation with 6GB VRAM via FramePack

  • Memory Optimization: Tiled decoding + temporal slicing

  • Multimodal Control: CLIP vision + text prompts for motion

  • Prompt Assistant: Built-in template in Note node

2. Core Models

Model

Function

Source

FramePackI2V_HY

Video diffusion (BF16)

HuggingFace

hunyuan_video_vae_bf16

Lightweight VAE

Manual download

sigclip_vision_patch14_384

Visual encoder

Auto-installed

3. Key Nodes

Node

Purpose

Installation

FramePackSampler

Frame-wise sampling

GitHub

VAEDecodeTiled

Memory-efficient decoding

Built-in (enable temporal_overlap)

VHS_VideoCombine

Video rendering

VideoHelperSuite

ImageResize+

Smart resizing

ComfyUI Manager

4. Pipeline Stages

Stage 1: Input Processing

  • Image Input: Load via LoadImage (e.g., work-04.jpg)

  • Resolution Matching: Auto-optimize with FramePackFindNearestBucket

  • Feature Extraction: CLIPVisionEncode encodes visual cues

Stage 2: Animation Core

  • Sampling:

    • 30 steps, CFG=10, UniPC-BH1 sampler

    • VRAM optimizations: teacache (0.15) + temporal_size=64

  • Motion Control: Text prompts (e.g., "slow zoom-in")

Stage 3: Output

  • Tiled Decoding: 128x128 blocks via VAEDecodeTiled

  • Video Export: MP4 output (30FPS, H.264)

5. Inputs & Outputs

  • Required Inputs:

    • Static architecture image

    • Motion prompts (Chinese preferred)

  • Outputs:

    • MP4 video (FramePack_00001.mp4)

    • Resolution: 512x512 to 1024x1024

6. Critical Notes

  1. VRAM Management:

    • Reduce total_second_length (1GB/10s)

    • Enable gpu_memory_preservation (default=6)

  2. Dependencies:

    • Download FramePackI2V_HY and hunyuan_video_vae_bf16

    • CLIP models auto-download

  3. Troubleshooting:

    • CUDA OOM → Lower latent_window_size (default=9)

    • Choppy video → Ensure temporal_overlap ≥8