Revamp Your Visuals: Inpainting, Style Transfer, and Auto-Captioning Made Easy
1. Workflow Overview

This is an SDXL-based inpainting and style transfer workflow designed for:
Local Inpainting: Fix defects or remove unwanted elements (e.g., watermarks).
Style Transfer: Apply reference image styles via IPAdapter and ControlNet.
Auto-Captioning: Generate prompts using Meta-Llama-3 for better text guidance.
2. Core Models
Model Name | Function |
---|---|
DreamShaper XL v2.1 Turbo | Main checkpoint for high-quality image generation. |
xinsir_controlnet_depth_sdxl | ControlNet model for depth-aware structure control. |
IPAdapter (PLUS) | Style transfer by injecting reference image features. |
MAT_Places512_G_fp16 | Inpainting-specific model for optimal repairs. |
3. Key Nodes
Node Name | Function | Installation |
---|---|---|
| Loads Fooocus inpainting models. | Via ComfyUI Manager |
| Advanced style transfer control. | Requires |
| Applies ControlNet depth conditioning. | Built-in (model download required). |
| Generates captions using Llama-3. | Install |
| Auto-calculates optimal inpainting resolution. | Built-in. |
Dependencies:
IPAdapter Models: Download
ip-adapter-plus_sdxl_vit-h.bin
tomodels/ipadapter
.ControlNet Models: Place
xinsir_controlnet_depth_sdxl_1.0.safetensors
inmodels/controlnet
.
4. Workflow Structure
Group 1: Auto-Captioning
Input: Original image (via
LoadImage
).Output: Generated caption text.
Key Nodes:
Joy_caption_load
,Joy_caption
.
Group 2: Preprocessing
Input: Image + mask.
Output: Cropped image and expanded mask.
Key Nodes:
CutForInpaint
,GrowMask
.
Group 3: Inpainting & Style Transfer
Input: Preprocessed image + ControlNet depth + IPAdapter reference.
Output: Repaired image.
Key Nodes:
KSampler
,IPAdapterAdvanced
,INPAINT_ApplyFooocusInpaint
.
Group 4: Post-Processing
Input: Original and inpainted images.
Output: Blended result (
BlendInpaint
) and comparison view (Image Comparer
).
5. Input & Output
Input Parameters:
Image: Must include a mask (marking areas to repair).
Resolution: Default
2048x2048
(auto-optimized byPixelPerfectResolution
).Prompts: Auto-generated by
Joy_caption
or manually entered.
Output: Final inpainted image (PNG) with comparison tool.
6. Notes
VRAM: ≥12GB GPU recommended (SDXL + ControlNet + IPAdapter are resource-heavy).
Common Errors:
Missing models trigger red error boxes—verify model paths.
Incorrect mask coverage leads to artifacts.
Optimization:
Enable
xformers
for faster generation.Reduce
KSampler
steps (e.g., from 30 to 20) to save time.