Transform Your Product Images with AI: A Comprehensive Workflow
1. Workflow Overview

This workflow specializes in AI-powered e-commerce image enhancement, featuring:
Product Style Transfer: Transform basic product shots into premium visuals
Intelligent Inpainting: Automatically fill transparent/missing areas
Multi-Modal Control: Precise generation via text prompts + reference images
Auto-Captioning: AI-generated product descriptions
Key Applications:
E-commerce listing image enhancement
Ad creatives generation
Product background replacement
2. Core Models
Model | Function | Source | Critical Parameters |
---|---|---|---|
Flux Model Series | Base image generation | Custom |
|
Florence-2 | Image understanding | Microsoft |
|
Meta-Llama-3 | Caption generation | Meta |
|
CLIP-Vision | Image feature extraction | OpenAI |
|
3. Critical Nodes Breakdown
Core Processing Nodes
Node | Function | Installation |
---|---|---|
StyleModelApply | Applies style transfer | Requires |
Florence2Run | Image analysis & captioning | Manual Florence-2 plugin install |
Joy_caption_two | Product description generation |
|
ImageConcanate | Image composition | Built-in node |
Special Dependencies
Flux Models:
Requires
F.1-Fill-fp16
inmodels/checkpoints
Optimized for product inpainting
Florence-2 Requirements:
bash
复制
pip install transformers>=4.35.0 torchvision
Captioning Model:
Minimum 8GB VRAM
Recommended 4bit quantization
4. Workflow Architecture
Processing Stages
Stage | Function | Key Nodes |
---|---|---|
Input Prep | Load product image + mask | LoadImage → ImageScaleByAspectRatio |
Style Transfer | Apply premium aesthetics | StyleModelLoader → CLIPVisionEncode |
Inpainting | Fill transparent areas | InpaintModelConditioning → KSampler |
Captioning | Generate product text | Florence2Run → Joy_caption_two |
Data Flow
graph LR
A[Raw Product Image] --> B[Background Separation]
B --> C[Style Transfer]
C --> D[Detail Inpainting]
D --> E[Caption Generation]
E --> F[Final Output]
5. I/O Specifications
Input Requirements
Images:
PNG format with alpha channel preferred
Minimum 1024x1024 resolution
Masks:
Black/white mask (white=inpaint areas)
Example:
clipspace-mask-6389222.png
Text Prompts:
Must include product category + style keywords
Example: "Luxury perfume bottle, minimalist, marble texture"
Outputs
Images:
HD product images (PNG)
Multiple style variants
Text:
Product descriptions (JSON/text)
6. Optimization Guide
VRAM Management:
Use
--medvram
flag for 8-12GB GPUsProcess large images in batches
Speed Boost:
# Add to custom_nodes/joy_caption/__init__.py: torch.backends.cuda.enable_flash_sdp(True)
Troubleshooting:
Uneven style transfer: Adjust
StyleModelApply
blend (0.3-0.7)Repetitive captions: Modify Llama-3's
repetition_penalty
(1.2 recommended)
7. Deployment Instructions
Step 1: Environment Setup
# Install core dependencies
pip install "git+https://github.com/microsoft/Florence-2.git"
Step 2: Plugin Installation
cd ComfyUI/custom_nodes
git clone https://github.com/JoyCloud/ComfyUI-JoyCaption
Step 3: Model Placement
Florence-2:
models/florence2
Flux models:
models/checkpoints
Verification
# Check CUDA acceleration
import torch
print(torch.cuda.is_available())
Real-World Use Case
Scenario: Watch product image upgrade
Input: White-background watch image + mask
Process:
Converts to gold/black luxury style
Generates leather-texture background
Outputs caption: "Luxury mechanical watch, 18K gold bezel, alligator strap"
Processing Time: ~45s (RTX 3090)
Note: Ideal for e-commerce teams needing batch processing, reducing 80%+ manual editing time.