Unleash AI-Powered Video Character Redraw: Transforming Videos with Style

CN
ComfyUI.org
2025-04-10 20:49:19

1. Workflow Overview

m9bcut8pbdpiuechjkw6acf8a263f6107cfe1be787f47dac5a192127eb2c95e5b29502af4c8bfad8e83.png

This workflow, named “wan2.1Fun_Video Character Redraw”, converts characters in a video into stylized images or videos using AI models. Key technologies include:

  • Frame Extraction: Extracts key frames from input video.

  • Segmentation & Pose Detection: Uses GroundingDino+SAM for person segmentation and Openpose for pose keypoints.

  • Text/Image-Guided Generation: Generates new content via Stable Diffusion (Wan2.1-Fun-Control).

  • Video Synthesis: Combines frames into a final video.

2. Core Models

  1. Stable Diffusion (Wan2.1-Fun-Control-14B)

    • Purpose: Generates high-quality images/videos from text/image prompts.

    • Model File: Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors.

  2. GroundingDino + SAM

    • Purpose: Detects and segments characters (e.g., man label).

    • Model Files: GroundingDINO_SwinT_OGC, sam_vit_b_01ec64.pth.

  3. ControlNet (Openpose)

    • Purpose: Preserves original pose structure.

    • Model File: control_v11p_sd15_openpose.pth.

  4. Florence2

    • Purpose: Auto-generates image captions (prompt inversion).

    • Model File: Florence-2-large.

3. Key Nodes

  • Video Input:

    • VHS_LoadVideo: Loads video files (e.g., 2795746-uhd_2160_3840_25fps.mp4).

  • Character Processing:

    • GroundingDinoSAMSegment: Segments characters and generates masks.

    • OpenposePreprocessor: Extracts pose keypoints.

  • Generation Control:

    • WanVideoTextEncode: Processes text prompts (e.g., "futuristic robot").

    • WanVideoSampler: Controls sampling (steps=25, CFG=8).

  • Output Synthesis:

    • VHS_VideoCombine: Combines frames into MP4 (H.264).

4. Workflow Structure (Groups)

  1. Frame Redraw (Text-Based)

    • Input: Video + text prompts.

    • Output: Redrawn first frame.

  2. Wan2.1 Character Conversion

    • Input: Masks + pose data.

    • Output: Stylized video.

  3. Prompt Inversion (Florence2)

    • Input: Reference image.

    • Output: Auto-generated detailed caption.

5. Inputs & Outputs

  • Inputs:

    • Video file (MP4).

    • Optional text prompts.

    • Generation params (512x910, Euler sampler).

  • Output:

    • Generated video (e.g., AnimateDiff_00027.mp4).

6. Notes

  1. Dependencies:

    • Install via ComfyUI Manager:

      • ComfyUI-WanVideoWrapper (video generation).

      • comfyui_controlnet_aux (pose extraction).

      • comfyui-florence2 (prompt inversion).

  2. Hardware:

    • Recommended VRAM ≥12GB (Wan2.1 model is large).

  3. Troubleshooting:

    • Model path errors: Verify .safetensors file locations.

    • Video encoding issues: Adjust CRF in VHS_VideoCombine (default=19).