Unlock Stunning Images: A Step-by-Step Guide to Flux.1-Based Text-to-Image Generation

ComfyUI.org

2025-03-11 08:01:50

Workflow Overview

This workflow is a Text-to-Image (T2I) generation process based on the Flux.1 model, designed to create high-quality, high-resolution images from text prompts. It integrates Flux.1-dev, LoRA enhancement, and multilingual support (e.g., translation), producing images with specific styles (e.g., Japanese temple architecture). The final output is a 1024x1280 image, suitable for artistic or professional photography-style applications.

Core Models

Flux.1-dev (flux1-dev.sft)
- Function: An efficient T2I model excelling in detailed, realistic image generation.
- Source: Download from Flux official channels (e.g., Hugging Face), place in ComfyUI/models/unet/.
LoRA: flux-lora-建筑3D立体剪纸-03.safetensors
- Function: A fine-tuning model adding a 3D papercut architectural style to Flux.1.
- Source: Obtain from communities (e.g., Civitai) or custom training, place in ComfyUI/models/loras/.
T5-XXL (t5xxl_fp16.safetensors)
- Function: A robust text encoder converting complex prompts into embeddings.
- Source: Download from ComfyUI official or Hugging Face, place in ComfyUI/models/text_encoders/.
CLIP-L (clip_l.safetensors)
- Function: A lightweight CLIP model, paired with T5-XXL for prompt encoding.
- Source: Download from ComfyUI official or Hugging Face, place in ComfyUI/models/clip/.
VAE (ae.safetensors)
- Function: Variational Autoencoder decoding latents into images.
- Source: Download from Flux official channels, place in ComfyUI/models/vae/.

Component Explanation

DualCLIPLoader
- Purpose: Loads T5-XXL and CLIP-L text encoders.
- Function: Prepares dual CLIP encoding for Flux, enhancing prompt comprehension.
- Installation: Built into ComfyUI.
- Dependencies: Requires t5xxl_fp16.safetensors and clip_l.safetensors.
CLIPTextEncode
- Purpose: Encodes text prompts into conditioning inputs.
- Function: Outputs conditioning data from CLIP and text for generation.
- Installation: Built into ComfyUI.
EmptyLatentImage
- Purpose: Creates an empty latent as the starting point for image generation.
- Function: Sets output resolution to 1024x1280.
- Installation: Built into ComfyUI.
KSamplerAdvanced
- Purpose: Performs sampling for the Flux model.
- Function: Generates images using DPM++ 2M sampler, 30 steps.
- Installation: Built into ComfyUI.
LoraLoaderModelOnly
- Purpose: Loads and applies the LoRA model to Flux.1.
- Function: Integrates LoRA at 0.8 strength for style enhancement.
- Installation: Built into ComfyUI.
- Dependencies: Requires flux-lora-建筑3D立体剪纸-03.safetensors.
VAEDecode
- Purpose: Decodes latents into the final image.
- Function: Outputs image data using VAE.
- Installation: Built into ComfyUI.
FluxGuidance
- Purpose: Adjusts positive conditioning strength.
- Function: Sets guidance to 3.5, controlling prompt adherence.
- Installation: Included in Flux support package.
ConditioningZeroOut
- Purpose: Creates an empty negative condition.
- Function: Ensures generation relies solely on positive prompts.
- Installation: Built into ComfyUI.
UNETLoader
- Purpose: Loads the Flux.1 UNet model.
- Function: Provides the core generation network.
- Installation: Built into ComfyUI.
- Dependencies: Requires flux1-dev.sft.
VAELoader
- Purpose: Loads the VAE model.
- Function: Supports image decoding.
- Installation: Built into ComfyUI.
- Dependencies: Requires ae.safetensors.
SaveImage
- Purpose: Saves the generated image.
- Function: Saves as PNG with “ComfyUI” prefix.
- Installation: Built into ComfyUI.
DeepTranslatorTextNode
- Purpose: Translates input prompts.
- Function: Converts English to Chinese (e.g., “Product on a white sink…”), supports multilingual input.
- Installation: Requires ComfyUI_Custom_Nodes_AlekPet, install via ComfyUI Manager (search “AlekPet”) or GitHub (https://github.com/AlekPet/ComfyUI_Custom_Nodes_AlekPet).
ShowText|pysssss
- Purpose: Displays translated text.
- Function: Useful for debugging or verifying translations.
- Installation: Requires ComfyUI-Custom-Scripts, install via ComfyUI Manager (search “Custom-Scripts”) or GitHub (https://github.com/pysssss/ComfyUI-Custom-Scripts).
CR Prompt Text
- Purpose: Provides initial text prompt input.
- Function: Outputs user-defined prompts, supports multiline text.
- Installation: Requires ComfyUI_Comfyroll_CustomNodes, install via ComfyUI Manager (search “Comfyroll”) or GitHub (https://github.com/RockOfFire/ComfyUI_Comfyroll_CustomNodes).

Workflow Structure

Prompt Input and Translation Group
- Nodes: CR Prompt Text → DeepTranslatorTextNode → ShowText|pysssss
- Role: Inputs English prompts and translates them to Chinese for debugging or reference.
- Input Parameters: English prompt (e.g., “A highly detailed, red-toned digital illustration…”).
- Output: Translated Chinese prompt (e.g., “一个高度详细的红色数字插图…”).
Model Loading and Encoding Group
- Nodes: UNETLoader → LoraLoaderModelOnly → DualCLIPLoader → CLIPTextEncode
- Role: Loads Flux.1, LoRA, and CLIP encoders, encoding the prompt into conditioning.
- Input Parameters: Prompt, model paths, LoRA strength (0.8).
- Output: Encoded positive conditioning.
Conditioning Adjustment Group
- Nodes: FluxGuidance → ConditioningZeroOut
- Role: Adjusts positive conditioning strength (3.5) and creates an empty negative condition.
- Input Parameters: Encoded conditioning.
- Output: Adjusted positive and negative conditioning.
Image Generation Group
- Nodes: EmptyLatentImage → KSamplerAdvanced → VAEDecode
- Role: Generates a 1024x1280 latent, samples it, and decodes it into an image.
- Input Parameters: Resolution (1024x1280), steps (30), guidance (3.5).
- Output: High-quality image.
Output Group
- Nodes: SaveImage
- Role: Saves the generated image.
- Input Parameters: Generated image data.
- Output: PNG image file.

Inputs and Outputs

Expected Inputs:
- Text prompt: Multiline English description (e.g., “A highly detailed, red-toned digital illustration…”).
- Resolution: 1024x1280.
- Seed: Random (349017919967907).
- Sampling steps: 30.
- Guidance: 3.5.
- LoRA strength: 0.8.
Final Output:
- 1024x1280 high-quality image, saved as PNG (prefix “ComfyUI”).

Notes and Tips

Resource Requirements: Flux.1 requires significant VRAM (12GB+ recommended); use fp8 versions if VRAM is limited.
Model Files: Ensure all files (flux1-dev.sft, t5xxl_fp16.safetensors, etc.) are correctly placed, or errors will occur.
Performance Optimization: Reduce steps (e.g., from 30 to 20) if generation is slow.
Plugin Installation: Install AlekPet, Custom-Scripts, and Comfyroll plugins, or translation/prompt input will fail.
Translation: DeepTranslatorTextNode uses Google Translate; ensure network access or configure a proxy.

workflow