Unlock FLUX: The Ultimate Multimodal Workflow for Text-to-Image and Image Captioning
1. Workflow Overview

This is a FLUX-based multimodal workflow supporting both text-to-image and image captioning, featuring:
Llama-3 8B for automatic image description
FLUX MIX V2 as base model (FP8 quantized)
Multi-LoRA stacking (Shining Nikki series)
Built-in Baidu translation node
Core Models:
FLUX MIX V2: Base generation model
Meta-Llama-3-8B: Image captioning model
Shining Nikki LoRAs: Costume style control
2. Critical Nodes
Node | Function | Installation |
---|---|---|
| Llama-3 caption generation | Install |
| CN/EN prompt translation | Requires API key |
| Dynamic prompt merging | Via |
| Modular toggle control | Needs |
Dependencies:
Model Files:
Place
小白_FLUX_MIX_V2.safetensors
inmodels/unet
Download
t5xxl_fp8_e4m3fn
CLIP model
Plugins: Essential to install
Impact Pack
3. Workflow Structure
Key Groups:
Prompt Input (Top-Left):
Supports CN/EN input + auto translation
Dynamic LoRA trigger word insertion
FLUX Generation (Center):
896x1152 resolution + Euler sampler
Three-stage LoRA stacking (weight 0.2~0.8)
Output (Right):
Batch generation (max 3 for non-VIP)
Auto-upload to Liblib cloud
4. Inputs & Outputs
Required Inputs:
Positive prompt: Chinese directly accepted
Optional image: For captioning (toggle② must be off)
LoRA selection: Dropdown menu
Final Output:
Format: JPG/PNG with metadata
Path: Auto-saved to Liblib cloud
5. Notes
VRAM: ≥16GB required (FP8 + multi-LoRA)
Common Errors:
LoRA not found
: Avoid spaces in filenameTranslation failed
: Configure Baidu API key
Optimization:
Disable unused LoRA groups
Reduce resolution to 768x1024 for batch generation