Unlock Stunning Images: A Step-by-Step Guide to Flux.1-Based Text-to-Image Generation

CN
ComfyUI.org
2025-03-11 08:01:50

Workflow Overview

m847clxyur07rsqykbComfyUI_00138_.png

This workflow is a Text-to-Image (T2I) generation process based on the Flux.1 model, designed to create high-quality, high-resolution images from text prompts. It integrates Flux.1-dev, LoRA enhancement, and multilingual support (e.g., translation), producing images with specific styles (e.g., Japanese temple architecture). The final output is a 1024x1280 image, suitable for artistic or professional photography-style applications.

Core Models

  1. Flux.1-dev (flux1-dev.sft)

    • Function: An efficient T2I model excelling in detailed, realistic image generation.

    • Source: Download from Flux official channels (e.g., Hugging Face), place in ComfyUI/models/unet/.

  2. LoRA: flux-lora-建筑3D立体剪纸-03.safetensors

    • Function: A fine-tuning model adding a 3D papercut architectural style to Flux.1.

    • Source: Obtain from communities (e.g., Civitai) or custom training, place in ComfyUI/models/loras/.

  3. T5-XXL (t5xxl_fp16.safetensors)

    • Function: A robust text encoder converting complex prompts into embeddings.

    • Source: Download from ComfyUI official or Hugging Face, place in ComfyUI/models/text_encoders/.

  4. CLIP-L (clip_l.safetensors)

    • Function: A lightweight CLIP model, paired with T5-XXL for prompt encoding.

    • Source: Download from ComfyUI official or Hugging Face, place in ComfyUI/models/clip/.

  5. VAE (ae.safetensors)

    • Function: Variational Autoencoder decoding latents into images.

    • Source: Download from Flux official channels, place in ComfyUI/models/vae/.

Component Explanation

  1. DualCLIPLoader

    • Purpose: Loads T5-XXL and CLIP-L text encoders.

    • Function: Prepares dual CLIP encoding for Flux, enhancing prompt comprehension.

    • Installation: Built into ComfyUI.

    • Dependencies: Requires t5xxl_fp16.safetensors and clip_l.safetensors.

  2. CLIPTextEncode

    • Purpose: Encodes text prompts into conditioning inputs.

    • Function: Outputs conditioning data from CLIP and text for generation.

    • Installation: Built into ComfyUI.

  3. EmptyLatentImage

    • Purpose: Creates an empty latent as the starting point for image generation.

    • Function: Sets output resolution to 1024x1280.

    • Installation: Built into ComfyUI.

  4. KSamplerAdvanced

    • Purpose: Performs sampling for the Flux model.

    • Function: Generates images using DPM++ 2M sampler, 30 steps.

    • Installation: Built into ComfyUI.

  5. LoraLoaderModelOnly

    • Purpose: Loads and applies the LoRA model to Flux.1.

    • Function: Integrates LoRA at 0.8 strength for style enhancement.

    • Installation: Built into ComfyUI.

    • Dependencies: Requires flux-lora-建筑3D立体剪纸-03.safetensors.

  6. VAEDecode

    • Purpose: Decodes latents into the final image.

    • Function: Outputs image data using VAE.

    • Installation: Built into ComfyUI.

  7. FluxGuidance

    • Purpose: Adjusts positive conditioning strength.

    • Function: Sets guidance to 3.5, controlling prompt adherence.

    • Installation: Included in Flux support package.

  8. ConditioningZeroOut

    • Purpose: Creates an empty negative condition.

    • Function: Ensures generation relies solely on positive prompts.

    • Installation: Built into ComfyUI.

  9. UNETLoader

    • Purpose: Loads the Flux.1 UNet model.

    • Function: Provides the core generation network.

    • Installation: Built into ComfyUI.

    • Dependencies: Requires flux1-dev.sft.

  10. VAELoader

    • Purpose: Loads the VAE model.

    • Function: Supports image decoding.

    • Installation: Built into ComfyUI.

    • Dependencies: Requires ae.safetensors.

  11. SaveImage

    • Purpose: Saves the generated image.

    • Function: Saves as PNG with “ComfyUI” prefix.

    • Installation: Built into ComfyUI.

  12. DeepTranslatorTextNode

    • Purpose: Translates input prompts.

    • Function: Converts English to Chinese (e.g., “Product on a white sink…”), supports multilingual input.

    • Installation: Requires ComfyUI_Custom_Nodes_AlekPet, install via ComfyUI Manager (search “AlekPet”) or GitHub (https://github.com/AlekPet/ComfyUI_Custom_Nodes_AlekPet).

  13. ShowText|pysssss

    • Purpose: Displays translated text.

    • Function: Useful for debugging or verifying translations.

    • Installation: Requires ComfyUI-Custom-Scripts, install via ComfyUI Manager (search “Custom-Scripts”) or GitHub (https://github.com/pysssss/ComfyUI-Custom-Scripts).

  14. CR Prompt Text

Workflow Structure

  1. Prompt Input and Translation Group

    • Nodes: CR Prompt Text → DeepTranslatorTextNode → ShowText|pysssss

    • Role: Inputs English prompts and translates them to Chinese for debugging or reference.

    • Input Parameters: English prompt (e.g., “A highly detailed, red-toned digital illustration…”).

    • Output: Translated Chinese prompt (e.g., “一个高度详细的红色数字插图…”).

  2. Model Loading and Encoding Group

    • Nodes: UNETLoader → LoraLoaderModelOnly → DualCLIPLoader → CLIPTextEncode

    • Role: Loads Flux.1, LoRA, and CLIP encoders, encoding the prompt into conditioning.

    • Input Parameters: Prompt, model paths, LoRA strength (0.8).

    • Output: Encoded positive conditioning.

  3. Conditioning Adjustment Group

    • Nodes: FluxGuidance → ConditioningZeroOut

    • Role: Adjusts positive conditioning strength (3.5) and creates an empty negative condition.

    • Input Parameters: Encoded conditioning.

    • Output: Adjusted positive and negative conditioning.

  4. Image Generation Group

    • Nodes: EmptyLatentImage → KSamplerAdvanced → VAEDecode

    • Role: Generates a 1024x1280 latent, samples it, and decodes it into an image.

    • Input Parameters: Resolution (1024x1280), steps (30), guidance (3.5).

    • Output: High-quality image.

  5. Output Group

    • Nodes: SaveImage

    • Role: Saves the generated image.

    • Input Parameters: Generated image data.

    • Output: PNG image file.

Inputs and Outputs

  • Expected Inputs:

    • Text prompt: Multiline English description (e.g., “A highly detailed, red-toned digital illustration…”).

    • Resolution: 1024x1280.

    • Seed: Random (349017919967907).

    • Sampling steps: 30.

    • Guidance: 3.5.

    • LoRA strength: 0.8.

  • Final Output:

    • 1024x1280 high-quality image, saved as PNG (prefix “ComfyUI”).

Notes and Tips

  1. Resource Requirements: Flux.1 requires significant VRAM (12GB+ recommended); use fp8 versions if VRAM is limited.

  2. Model Files: Ensure all files (flux1-dev.sft, t5xxl_fp16.safetensors, etc.) are correctly placed, or errors will occur.

  3. Performance Optimization: Reduce steps (e.g., from 30 to 20) if generation is slow.

  4. Plugin Installation: Install AlekPet, Custom-Scripts, and Comfyroll plugins, or translation/prompt input will fail.

  5. Translation: DeepTranslatorTextNode uses Google Translate; ensure network access or configure a proxy.