Unlock FLUX: The Ultimate Multimodal Workflow for Text-to-Image and Image Captioning

CN
ComfyUI.org
2025-06-05 09:53:20

1. Workflow Overview

mbknq7z5jneemp2yhgh图片压缩333333.png

This is a FLUX-based multimodal workflow supporting both text-to-image and image captioning, featuring:

  • Llama-3 8B for automatic image description

  • FLUX MIX V2 as base model (FP8 quantized)

  • Multi-LoRA stacking (Shining Nikki series)

  • Built-in Baidu translation node

Core Models:

  • FLUX MIX V2: Base generation model

  • Meta-Llama-3-8B: Image captioning model

  • Shining Nikki LoRAs: Costume style control


2. Critical Nodes

Node

Function

Installation

Joy_caption_two

Llama-3 caption generation

Install comfyui_slk_joy_caption_two

BaiduTranslateNode

CN/EN prompt translation

Requires API key

CR Text Concatenate

Dynamic prompt merging

Via Comfyroll

Fast Groups Bypasser

Modular toggle control

Needs rgthree

Dependencies:

  • Model Files:

    • Place 小白_FLUX_MIX_V2.safetensors in models/unet

    • Download t5xxl_fp8_e4m3fn CLIP model

  • Plugins: Essential to install Impact Pack


3. Workflow Structure

Key Groups:

  1. Prompt Input (Top-Left):

    • Supports CN/EN input + auto translation

    • Dynamic LoRA trigger word insertion

  2. FLUX Generation (Center):

    • 896x1152 resolution + Euler sampler

    • Three-stage LoRA stacking (weight 0.2~0.8)

  3. Output (Right):

    • Batch generation (max 3 for non-VIP)

    • Auto-upload to Liblib cloud


4. Inputs & Outputs

Required Inputs:

  • Positive prompt: Chinese directly accepted

  • Optional image: For captioning (toggle② must be off)

  • LoRA selection: Dropdown menu

Final Output:

  • Format: JPG/PNG with metadata

  • Path: Auto-saved to Liblib cloud


5. Notes

  • VRAM: ≥16GB required (FP8 + multi-LoRA)

  • Common Errors:

    • LoRA not found: Avoid spaces in filename

    • Translation failed: Configure Baidu API key

  • Optimization:

    • Disable unused LoRA groups

    • Reduce resolution to 768x1024 for batch generation