workflow

Model	Function	Source	Critical Parameters
Flux Model Series	Base image generation	Custom	`F.1-Fill-fp16` (inpainting specialty)
Florence-2	Image understanding	Microsoft	`microsoft/Florence-2-base`
Meta-Llama-3	Caption generation	Meta	`unsloth/Meta-Llama-3.1-8B` (4bit quantized)
CLIP-Vision	Image feature extraction	OpenAI	`sigclip_vision_patch14_384`

3. Critical Nodes Breakdown

Core Processing Nodes

Node	Function	Installation
StyleModelApply	Applies style transfer	Requires `ComfyUI-StyleModels`
Florence2Run	Image analysis & captioning	Manual Florence-2 plugin install
Joy_caption_two	Product description generation	`ComfyUI-JoyCaption` plugin
ImageConcanate	Image composition	Built-in node

Special Dependencies

Flux Models:
- Requires F.1-Fill-fp16 in models/checkpoints
- Optimized for product inpainting

Florence-2 Requirements:

bash

复制

pip install transformers>=4.35.0 torchvision

Captioning Model:
- Minimum 8GB VRAM
- Recommended 4bit quantization

4. Workflow Architecture

Processing Stages

Stage	Function	Key Nodes
Input Prep	Load product image + mask	LoadImage → ImageScaleByAspectRatio
Style Transfer	Apply premium aesthetics	StyleModelLoader → CLIPVisionEncode
Inpainting	Fill transparent areas	InpaintModelConditioning → KSampler
Captioning	Generate product text	Florence2Run → Joy_caption_two

Data Flow

graph LR
A[Raw Product Image] --> B[Background Separation]
B --> C[Style Transfer]
C --> D[Detail Inpainting]
D --> E[Caption Generation]
E --> F[Final Output]

5. I/O Specifications

Input Requirements

Images:
- PNG format with alpha channel preferred
- Minimum 1024x1024 resolution
Masks:
- Black/white mask (white=inpaint areas)
- Example: clipspace-mask-6389222.png
Text Prompts:
- Must include product category + style keywords
- Example: "Luxury perfume bottle, minimalist, marble texture"

Outputs

Images:
- HD product images (PNG)
- Multiple style variants
Text:
- Product descriptions (JSON/text)

6. Optimization Guide

VRAM Management:
- Use --medvram flag for 8-12GB GPUs
- Process large images in batches

Speed Boost:

# Add to custom_nodes/joy_caption/__init__.py:
torch.backends.cuda.enable_flash_sdp(True)

Troubleshooting:
- Uneven style transfer: Adjust StyleModelApply blend (0.3-0.7)
- Repetitive captions: Modify Llama-3's repetition_penalty (1.2 recommended)

7. Deployment Instructions

Step 1: Environment Setup

# Install core dependencies
pip install "git+https://github.com/microsoft/Florence-2.git"

Step 2: Plugin Installation

cd ComfyUI/custom_nodes
git clone https://github.com/JoyCloud/ComfyUI-JoyCaption

Step 3: Model Placement

Florence-2: models/florence2
Flux models: models/checkpoints

Verification

# Check CUDA acceleration
import torch
print(torch.cuda.is_available())

Real-World Use Case

Scenario: Watch product image upgrade

Input: White-background watch image + mask
Process:
- Converts to gold/black luxury style
- Generates leather-texture background
- Outputs caption: "Luxury mechanical watch, 18K gold bezel, alligator strap"
Processing Time: ~45s (RTX 3090)

Note: Ideal for e-commerce teams needing batch processing, reducing 80%+ manual editing time.

Unlock the Power of Video-to-Animation: A Comprehensive Pipeline Guide

Boost Your Visual Content with AI-Driven Image Generation Workflow

Recommend

Transforming Line Art into 3D-Style Renders: A Deep Dive into ControlNet and Dual CLIP Encoding

Unlock Stunning Art: Transform line art into vibrant illustrations & 3D-style renders with ControlNet-guided generation & super-resolution. Learn how to use this AI workflow for breathtaking results.

Unlock Liquid Magic: Advanced I2V Workflow for Stunning Visual Effects

Generate Stunning Liquid Collision Videos with I2V Workflow! Discover how to combine WanVideo's custom models with GIMM-VFI for breathtaking effects. Learn more and start creating now!

Master Local Edits & Style Transfers with This Cutting-Edge Workflow

Unlock AI-powered image editing: Local inpainting, style transfer & auto-upscaling with ICEdit, Flux, and ESRGAN models. Try now and transform your images!

"Revolutionizing Video Generation: Latest Updates on WAN 2.1 and Hunyuan Image to Video Models"

Explore the latest video model updates: WAN 2.1 in fp16, Hunyuan Image to Video v2, and LTXV. Boost your video creation with improved quality and workflows. Learn more and upgrade now!

Master Video Creation: A Workflow for First/Last Frame Generation and Enhancement

Unlock stunning video generation & enhancement! Discover a powerful workflow for creating dynamic videos from start/end images, featuring 4x super-resolution, 10x frame interpolation, and auto labeling. Try it now!

Summary

Unlock AI-powered e-commerce image enhancement with our workflow, featuring style transfer, intelligent inpainting, multi-modal control, and auto-captioning for premium product visuals and ad creatives.

Chapter

workflow:

CustomNodes:

CLIPTextEncode MaskToImage Imp...