MimicMotion Explained: How to Use Diffusion Models for Animation in ComfyUI

CN
ComfyUI.org
2025-03-04 16:05:56
m7uoqwp9rvhecdxh6tg05d858f91243f529c5d179557f54e639275dbc3ff826814ba5e898f02f3623cc.webp

Workflow Overview

  • Purpose and Function: This generates an animated video using the MimicMotion model. It takes a reference image and a set of pose images (extracted from a video) as input, uses MimicMotion to create an animation mimicking the pose sequence, and outputs an MP4 video.

  • Core Models:

    • MimicMotionMergedUnet: A diffusion-based motion generation model that creates animated frames from a reference image and pose sequence.

    • Stable Diffusion Components (implicit): MimicMotion leverages diffusion model architecture, likely involving latent encoding and decoding.

Component Breakdown

  • Key Components (Nodes):

    1. LoadImage: Loads the reference image to define the animation’s base style.

    2. VHS_LoadVideo: Loads an input video to extract pose frames and audio.

    3. ImageResizeKJ: Resizes the reference image and pose frames to match model requirements (768x1024).

    4. MimicMotionGetPoses: Processes the reference image and pose images to create a pose sequence with reference.

    5. GetImageSizeAndCount: Retrieves image size and frame count information.

    6. DownloadAndLoadMimicMotionModel: Downloads and loads the MimicMotion model.

    7. DiffusersScheduler: Defines the diffusion scheduler (EulerDiscreteScheduler) for generation steps.

    8. MimicMotionSampler: Core sampling node that generates latent representations from the reference and pose sequence.

    9. MimicMotionDecode: Decodes latent representations into an image sequence.

    10. VHS_VideoCombine: Combines the image sequence and audio into a final video.

  • Installation Methods:

    • Basic Nodes (e.g., LoadImage): Included in ComfyUI by default.

    • MimicMotion Nodes: Install the MimicMotion plugin via ComfyUI Manager (search “MimicMotion”) or manually from GitHub into custom_nodes.

    • VHS Nodes: Install VideoHelperSuite (VHS) via ComfyUI Manager or GitHub.

    • ImageResizeKJ: Install KJNodes plugin (GitHub: https://github.com/kohya-ss/KJNodes).

  • Dependencies on Special Models or Plugins:

    • MimicMotionMergedUnet_1-0-fp16.safetensors: Download from official sources (e.g., Hugging Face or MimicMotion project page) and place in models/checkpoints.

Workflow Structure

  • Groups (Not explicitly grouped in JSON; logically divided):

    1. Input Preprocessing Group:

      • Nodes: LoadImage, VHS_LoadVideo, ImageResizeKJ (ID 28, 35), MimicMotionGetPoses

      • Role: Loads reference image and video, resizes them, and prepares pose sequence.

      • Inputs: Reference image file, video file.

      • Outputs: Resized reference image and pose image sequence.

    2. Model Loading and Scheduling Group:

      • Nodes: DownloadAndLoadMimicMotionModel, DiffusersScheduler

      • Role: Loads the MimicMotion model and configures the sampling scheduler.

      • Inputs: Model file path, scheduler settings (e.g., 700 steps).

      • Outputs: Mimic Pipeline and scheduler.

    3. Animation Generation Group:

      • Nodes: GetImageSizeAndCount, MimicMotionSampler, MimicMotionDecode

      • Role: Generates animation frames based on reference and pose sequence.

      • Inputs: Reference image, pose images, model pipeline, scheduler.

      • Outputs: Image sequence.

    4. Video Synthesis Group:

      • Nodes: VHS_VideoCombine

      • Role: Combines image sequence and audio into an MP4 video.

      • Inputs: Image sequence, audio.

      • Outputs: MP4 video file.

Inputs and Outputs

  • Expected Input Parameters:

    • Reference Image: A PNG/JPG image (e.g., 296930741-...png).

    • Pose Video: An MP4 video (e.g., 1月21日.mp4) providing the pose sequence.

    • Resolution: Set to 768x1024.

    • Sampling Parameters: Steps (20), Seed (42), CFG Scale, etc. (set in MimicMotionSampler).

  • Final Output:

    • An MP4 video with the prefix “MimicMotion” (e.g., MimicMotion_00001-audio.mp4), at 12 fps.

Notes and Considerations

  • Potential Errors:

    • Missing model file: Ensure MimicMotionMergedUnet_1-0-fp16.safetensors is downloaded and correctly placed.

    • Insufficient memory: Video generation requires significant VRAM; 12GB+ GPU recommended.

  • Performance Optimization:

    • Reduce frame count (adjust frame_load_cap or select_every_nth).

    • Use FP16 precision to lower VRAM usage.

  • Compatibility Issues:

    • Ensure VHS and MimicMotion plugin versions match ComfyUI.

  • Resource Requirements:

    • Minimum 8GB VRAM GPU; 16GB recommended for smooth operation.