Unlock Multi-View Consistency: AI Model Image Generation Workflow

CN
ComfyUI.org
2025-06-05 10:07:46

1. Workflow Overview

mbj7peoanz7mrtt0wbe0350968c6ad268c52ee5c2a99979b688ab14ec923ccd16036e02a15a39f84e4.jpg

Purpose:
This workflow generates multi-view consistent AI model images by:

  • Using ControlNet for pose control & Flux for detail enhancement

  • Generating 4 coherent views (front/side/back) from an input pose map

  • Utilizing TTP Tile technology for high-resolution generation

Core Models:

Model Name

Function

Stable Diffusion XL

Base text-to-image model for high-quality generation

ControlNet (Depth)

Controls pose and composition via depth maps

Florence-2

Generates image captions to refine prompts

Flux Guidance

Enhances character consistency across views


2. Key Components & Installation

Required Nodes:

  • ControlNetApplySD3: Applies ControlNet constraints (requires FLUX.1-dev-Controlnet-Depth model)

  • FluxGuidance: Ensures character consistency (install Flux plugin)

  • TTP_Tile: Processes large images in tiles (install Tiled Diffusion via ComfyUI Manager)

  • Florence2Run: Generates image captions (download HuggingFace model)

Dependencies:

  • Lora: 苏-FLUX小红书极致真实_v1.0 (place in models/loras)

  • ControlNet Model: FLUX.1-dev-Controlnet-Depth-InstantX.safetensors


3. Workflow Structure

Group 1: Input Control

  • Nodes: LoadImage (pose map), easy positive (prompts)

  • Inputs: Pose map (e.g., skeleton image), character description prompts

  • Outputs: Encoded conditioning vectors

Group 2: Image Generation

  • Nodes: KSampler + ControlNetApplySD3 + FluxGuidance

  • Logic: Generates latent images with pose consistency via Flux

Group 3: Tiled Processing (TTP Tile)

  • Nodes: TTP_Image_Tile_BatchSamplerCustomAdvancedTTP_Image_Assy

  • Function: Splits high-res images into tiles for VRAM efficiency

Group 4: Post-Processing

  • Nodes: ImageCrop+ (view cropping), Image Overlay (multi-view merge), SaveImage

  • Output: Final PNG with 4 aligned model views


4. Inputs & Outputs

Input Parameters:

  • Required: Pose map (e.g., POSE2.png), positive prompts

  • Optional: Seed value, resolution (default 1152x896), ControlNet strength (0.6)

Output:

  • A single PNG with 4 model views (left/front/back/right), saved to ComfyUI/output


5. Notes

  • VRAM: ≥12GB GPU recommended (reduce tile size for lower usage)

  • Troubleshooting:

    • Missing ControlNet model → Download to models/controlnet

    • Flux plugin not found → Install via ComfyUI Manager

  • Optimization:

    • Lower TTP tile size (e.g., 512x512) for better performance

    • Use easy cleanGpuUsed to free VRAM manually