Transform Your Product Images with AI: A Comprehensive Workflow

CN
ComfyUI.org
2025-03-25 10:44:13

1. Workflow Overview

m8odbbn1bqaqpvigescaf3d1cc8a31090e1d0a5212d988466ae7d88b6ff781a145d865d1b3159447261.png

This workflow specializes in AI-powered e-commerce image enhancement, featuring:

  1. Product Style Transfer: Transform basic product shots into premium visuals

  2. Intelligent Inpainting: Automatically fill transparent/missing areas

  3. Multi-Modal Control: Precise generation via text prompts + reference images

  4. Auto-Captioning: AI-generated product descriptions

Key Applications:

  • E-commerce listing image enhancement

  • Ad creatives generation

  • Product background replacement


2. Core Models

Model

Function

Source

Critical Parameters

Flux Model Series

Base image generation

Custom

F.1-Fill-fp16 (inpainting specialty)

Florence-2

Image understanding

Microsoft

microsoft/Florence-2-base

Meta-Llama-3

Caption generation

Meta

unsloth/Meta-Llama-3.1-8B (4bit quantized)

CLIP-Vision

Image feature extraction

OpenAI

sigclip_vision_patch14_384


3. Critical Nodes Breakdown

Core Processing Nodes

Node

Function

Installation

StyleModelApply

Applies style transfer

Requires ComfyUI-StyleModels

Florence2Run

Image analysis & captioning

Manual Florence-2 plugin install

Joy_caption_two

Product description generation

ComfyUI-JoyCaption plugin

ImageConcanate

Image composition

Built-in node

Special Dependencies

  1. Flux Models:

    • Requires F.1-Fill-fp16 in models/checkpoints

    • Optimized for product inpainting

  2. Florence-2 Requirements:

    bash

    复制

    pip install transformers>=4.35.0 torchvision
  3. Captioning Model:

    • Minimum 8GB VRAM

    • Recommended 4bit quantization


4. Workflow Architecture

Processing Stages

Stage

Function

Key Nodes

Input Prep

Load product image + mask

LoadImage → ImageScaleByAspectRatio

Style Transfer

Apply premium aesthetics

StyleModelLoader → CLIPVisionEncode

Inpainting

Fill transparent areas

InpaintModelConditioning → KSampler

Captioning

Generate product text

Florence2Run → Joy_caption_two

Data Flow

graph LR
A[Raw Product Image] --> B[Background Separation]
B --> C[Style Transfer]
C --> D[Detail Inpainting]
D --> E[Caption Generation]
E --> F[Final Output]

5. I/O Specifications

Input Requirements

  1. Images:

    • PNG format with alpha channel preferred

    • Minimum 1024x1024 resolution

  2. Masks:

    • Black/white mask (white=inpaint areas)

    • Example: clipspace-mask-6389222.png

  3. Text Prompts:

    • Must include product category + style keywords

    • Example: "Luxury perfume bottle, minimalist, marble texture"

Outputs

  • Images:

    • HD product images (PNG)

    • Multiple style variants

  • Text:

    • Product descriptions (JSON/text)


6. Optimization Guide

  1. VRAM Management:

    • Use --medvram flag for 8-12GB GPUs

    • Process large images in batches

  2. Speed Boost:

    # Add to custom_nodes/joy_caption/__init__.py:
    torch.backends.cuda.enable_flash_sdp(True)
  3. Troubleshooting:

    • Uneven style transfer: Adjust StyleModelApply blend (0.3-0.7)

    • Repetitive captions: Modify Llama-3's repetition_penalty (1.2 recommended)


7. Deployment Instructions

Step 1: Environment Setup

# Install core dependencies
pip install "git+https://github.com/microsoft/Florence-2.git"

Step 2: Plugin Installation

cd ComfyUI/custom_nodes
git clone https://github.com/JoyCloud/ComfyUI-JoyCaption

Step 3: Model Placement

  • Florence-2: models/florence2

  • Flux models: models/checkpoints

Verification

# Check CUDA acceleration
import torch
print(torch.cuda.is_available())

Real-World Use Case

Scenario: Watch product image upgrade

  1. Input: White-background watch image + mask

  2. Process:

    • Converts to gold/black luxury style

    • Generates leather-texture background

    • Outputs caption: "Luxury mechanical watch, 18K gold bezel, alligator strap"

  3. Processing Time: ~45s (RTX 3090)

Note: Ideal for e-commerce teams needing batch processing, reducing 80%+ manual editing time.