Boost Your Visual Content with AI-Driven Image Generation Workflow

CN
ComfyUI.org
2025-03-25 10:53:22

1. Workflow Overview

m8odm4cqkbwgog89skec86620507de72b5a4bf693146b6afb0834e8e05ef1bc30ce65c35371083bee9.png

This workflow specializes in controlled image generation with:

  1. Multi-Modal Inputs: Text prompts + reference images

  2. Precision Control: Flux sampling parameters + LoRA adjustments

  3. Bilingual Processing: Integrated Baidu translation for prompts

  4. Comparative Outputs: Side-by-side result analysis

Key Applications:

  • Advertising material generation

  • Social media content creation

  • Product visualization


2. Core Models

Model

Function

Source

Critical Parameters

Flux Model Series

Base image generation

Custom

F.1-Fill-fp16 (detail enhancement)

CLIP Dual-Encoder

Text+Image understanding

OpenAI

t5xxl_fp8_e4m3fn.safetensors

Realistic LoRA

Lifestyle photo enhancement

Custom

F1.超真实日常生活照_1.0


3. Critical Nodes

Core Processing Nodes

Node

Function

Installation

FluxSamplerParams+

Precision sampling control

Requires ComfyUI-Flux

FluxAttentionSeeker+

Dynamic attention adjustment

Custom node

BaiduTranslateNode

Real-time prompt translation

Manual install

Special Dependencies

  1. Flux Models:

    git clone https://github.com/FluxAI/ComfyUI-Flux
  2. Translation Service:

    • Requires Baidu API key

    • Configure in config/baidu_translate.json


4. Workflow Architecture

Processing Stages

Stage

Key Nodes

Output

Input Prep

DualCLIPLoader → EmptySD3LatentImage

1024x1504 latent space

Controlled Generation

FluxSamplerParams+ → ModelSamplingFlux

Parameterized output

Analysis

PlotParameters+ → SaveImage

Comparative results

Data Flow

graph LR
A[Text Prompt] --> B[CLIP Encoding]
C[Reference Image] --> D[Latent Space]
B --> E[Flux Sampling]
D --> E
E --> F[VAE Decode]
F --> G[Result Comparison]

5. I/O Specifications

Input Requirements

  1. Text Prompts:

    • Chinese/English bilingual support

    • Example: "Urban fashion photoshoot with graffiti backdrop"

  2. Image Inputs:

    • Recommended 1024x1024 PNG

    • Alpha channel for masking

Outputs

  • Primary Image: High-res generated result

  • Parameter Analysis: Visualized sampling metrics

  • Bilingual Captions: JSON metadata


6. Optimization Guide

  1. Performance Tweaks:

    # In custom_nodes/flux_sampler.py:
    torch.set_float32_matmul_precision('high')
  2. Quality Adjustment:

    • Flux guidance_scale: 3.5-5.0

    • LoRA strength: 0.7-1.0

  3. Troubleshooting:

    • Blurry outputs: Increase steps (20→30)

    • Over-saturation: Reduce cfg_scale (3.5→2.5)


7. Deployment

Step 1: Dependency Installation

pip install baidu-aip torchvision>=0.15

Step 2: Model Placement

  • Flux models: models/flux

  • LoRAs: models/loras

Verification Command

# Check translation service
from aip import AipNlp
print(AipNlp('APP_ID','API_KEY','SECRET_KEY').detectLang("test"))

Real-World Use Case

Scenario: Sneaker ad generation

  1. Input:

    • Prompt: "限量版运动鞋,未来主义设计,霓虹灯光效"

    • Blank canvas 1024x1024

  2. Process:

    • Applies hyper-realistic texture LoRA

    • Generates 3 style variants

    • Outputs English/Chinese captions

  3. Output:

    {
      "image": "sneaker_ad_final.png",
      "caption_en": "Limited edition sneakers with cyberpunk neon lighting",
      "parameters": {"steps":25, "cfg":3.8}
    }

Processing Time: ~38s (RTX 4090)

Note: Ideal for marketing teams needing rapid visual prototyping with precise control.