"Unlocking Artistic Potential: A Deep Dive into the Flux.1 and Florence-2 Workflow"

CN
ComfyUI.org
2025-03-13 08:08:43

Workflow Overview

m872hec38awv5593x5ceec772db0e6d429789b10cdc06bfe209e4f92896ca9470e5fc0e8fce760e21ac.png

This workflow aims to generate high-quality oil painting-style images using Flux.1 (an enhanced Stable Diffusion model) while incorporating Florence-2 for image-to-text captioning to produce descriptive prompts. It operates in two main phases:

  1. Image Captioning Phase: Generates a detailed text description from an input image (A2.png).

  2. Image Generation Phase: Uses the generated prompt with Flux.1 and multiple LoRA models to create an oil painting-style image.

Core Models

  1. Flux.1:

    • Function: Advanced diffusion-based model for high-resolution image generation.

    • Source: Download from official sources (e.g., Hugging Face), such as flux1-dev.sft.

  2. VAE (Variational Autoencoder):

    • Function: Decodes Latent representations into final images, improving quality.

    • Source: Uses ae.sft, manually downloaded and placed in models/vae.

  3. Florence-2:

    • Function: Image-to-text model for generating detailed captions.

    • Source: Downloaded via DownloadAndLoadFlorence2Model from microsoft/Florence-2-large.

Component Explanation

  1. UNETLoader (ID: 10):

    • Purpose: Loads the Flux.1 UNet model (基础算法_F.1).

    • Installation: Default ComfyUI node, requires manual model file placement.

  2. DualCLIPLoader (ID: 11):

    • Purpose: Loads Flux-specific CLIP models (t5xxl_fp8_e4m3fn and clip_l).

    • Installation: Requires Flux support plugin (via ComfyUI Manager or GitHub).

  3. CLIPTextEncodeFlux (ID: 74):

    • Purpose: Encodes text prompts into conditioning vectors for Flux.

    • Installation: Flux-specific node, requires Flux plugin.

  4. KSampler (ID: 22):

    • Purpose: Performs sampling to generate Latent representations.

    • Installation: Core ComfyUI component, default.

  5. VAEDecode (ID: 20):

    • Purpose: Decodes Latent into an image.

    • Installation: Default node.

  6. LoraLoaderModelOnly (ID: 31, 76, 77, 78):

    • Purpose: Loads multiple LoRA models to enhance Flux with oil painting styles.

    • Installation: Default node; LoRA files (e.g., 油画厚涂风格 Oil Painting_FLUX_FLUX_F.1) from Civitai or Hugging Face.

  7. DownloadAndLoadFlorence2Model (ID: 81):

    • Purpose: Downloads and loads the Florence-2 model.

    • Installation: Requires Florence-2 plugin (e.g., ComfyUI-Florence2).

  8. Florence2Run (ID: 80):

    • Purpose: Runs Florence-2 to generate image captions.

    • Installation: Same as above.

  9. StringFunction|pysssss (ID: 73):

    • Purpose: Processes Florence-2 captions, appending additional descriptions.

    • Installation: Requires pysssss custom node (from GitHub).

Workflow Structure

  1. Captioning Group (Group: 反推):

    • Role: Generates descriptive prompts from an input image.

    • Nodes: LoadImage → DownloadAndLoadFlorence2Model → Florence2Run → StringFunction|pysssss.

    • Inputs: Image A2.png.

    • Outputs: Text description (e.g., “A romantic oil painting...”).

  2. Model Loading Group:

    • Role: Loads Flux.1 and enhancing LoRA models.

    • Nodes: UNETLoader → multiple LoraLoaderModelOnly.

    • Inputs: Model and LoRA file paths.

    • Outputs: Enhanced model.

  3. Text Encoding Group:

    • Role: Encodes text prompts into conditions.

    • Nodes: DualCLIPLoader → CLIPTextEncodeFlux (positive prompt), CLIPTextEncode (negative prompt).

    • Inputs: Positive prompt (from captioning), negative prompt (NSFW).

    • Outputs: Conditioning vectors.

  4. Image Generation Group:

    • Role: Generates the final image.

    • Nodes: EmptyLatentImage → KSampler → VAEDecode → PreviewImage.

    • Inputs: Resolution (904x1600), sampling parameters.

    • Outputs: Oil painting-style image.

Inputs and Outputs

  • Input Parameters:

    • Image: A2.png (placed in input folder).

    • Resolution: 904x1600 (set by EmptyLatentImage).

    • Seed: Randomized (set in KSampler).

    • Prompt: Auto-generated and enhanced by Florence-2.

  • Output Results:

    • PNG image in oil painting style, sized 904x1600.

Notes and Considerations

  1. Model Files: Ensure Flux.1 (基础算法_F.1), VAE (ae.sft), and LoRA files are in the models folder.

  2. Plugin Installation: Install Flux and Florence-2 plugins; missing plugins may cause node errors.

  3. Performance: Flux.1 and Florence-2 require significant GPU power (12GB+ VRAM recommended).

  4. Troubleshooting: Check file paths if “model not found” errors occur; ensure plugin compatibility.