Unleash Artistic Potential: Leveraging Flux.1 for Hand-Drawn Watercolor Images

CN
ComfyUI.org
2025-03-12 08:01:43

Workflow Overview

m85mrki4umez68nnjwc6534481b0119501b6c0e7b7668f752ebfe1ae134b74970b15dd33be8bb7405c4.png

This workflow’s primary purpose is to leverage the Flux.1 model and depth control techniques to generate high-quality artistic-style images (hand-drawn watercolor) from an input image, enhanced by Joy2 captioning to derive descriptive prompts. The specific goals are:

  • Image Processing and Generation: Generate a 1024x1024 artistic-style image based on the input image (20230304185125_b966e.jpg).

  • Depth Control: Use the DepthAnythingV2 model to extract depth information and guide generation via ControlNet.

  • Prompt Optimization: Utilize the Joy_caption_two node to reverse-engineer detailed descriptive text from the input image, combined with predefined prompts for final generation. This workflow is suitable for art creation, image stylization, or generating hand-drawn effects from photos.

Core Models

  1. Flux.1 (基础算法_F.1)

    • Function: An efficient text-to-image model supporting high-resolution generation, ideal for artistic-style images.

    • Source: Download from Civitai or official repositories, place in ComfyUI/models/checkpoints/, e.g., 基础算法_F.1_fp8_e4m3fn.safetensors.

  2. DepthAnythingV2 (depth_anything_v2_vitl_fp32.safetensors)

    • Function: Extracts depth information from images for ControlNet guidance, enhancing spatial structure.

    • Source: Automatically downloaded via DownloadAndLoadDepthAnythingV2Model, stored in ComfyUI/models/.

  3. Lora Model (姑苏_F.1-手绘水彩风萌宠_V1.0)

    • Function: Fine-tunes the Flux.1 model to generate hand-drawn watercolor-style pet images.

    • Source: Download from Civitai or custom Lora repositories, place in ComfyUI/models/loras/.

  4. Upscale Model (4x-UltraSharp)

    • Function: Upscales generated images to enhance details.

    • Source: Download from ComfyUI model library, place in ComfyUI/models/upscale_models/.

Component Explanation

Below are the key nodes in the workflow, including their purpose, function, and installation method, along with dependencies:

  1. Joy_caption_two_load

    • Purpose: Loads the Joy2 pipeline for image captioning.

    • Function: Outputs a JoyTwoPipeline object, processed with the Llama 3.1 model.

    • Installation: Requires the JoyCaption plugin, install via ComfyUI Manager (search “JoyCaption”) or GitHub (https://github.com/comfyanonymous/ComfyUI_JoyCaption).

    • Dependencies: Requires unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit model, download and place in ComfyUI/models/joy_caption/.

  2. Joy_caption_two

    • Purpose: Generates descriptive text from input images.

    • Function: Outputs a detailed string (e.g., describing image content), supports Descriptive mode with a max length of 150 characters.

    • Installation: Shares plugin with Joy_caption_two_load.

    • Dependencies: Requires JoyTwoPipeline.

  3. ttN concat

    • Purpose: Concatenates multiple text strings.

    • Function: Merges predefined text (e.g., “Hand-drawn watercolor illustration”) with Joy2-generated descriptions.

    • Installation: Requires ttN Nodes plugin, install via ComfyUI Manager (search “ttN”) or GitHub (https://github.com/ttN-ComfyUI/ttN_nodes).

  4. ShowText|pysssss

    • Purpose: Displays and passes text content.

    • Function: Shows Joy2-generated descriptions or merged text.

    • Installation: Built into ComfyUI, no additional setup needed.

  5. LoadFluxControlNet

    • Purpose: Loads a Flux-compatible ControlNet model.

    • Function: Outputs a FluxControlNet object for depth control.

    • Installation: Requires XLabs plugin, install via ComfyUI Manager (search “XLabs”) or GitHub (https://github.com/XLabs-AI/ComfyUI-XLabs).

    • Dependencies: Requires XLabs-flux-depth-controlnet_v3 file, download and place in ComfyUI/models/controlnet/.

  6. ApplyFluxControlNet

    • Purpose: Applies ControlNet depth control.

    • Function: Combines depth maps to generate conditioning, enhancing structure.

    • Installation: Shares plugin with LoadFluxControlNet.

    • Dependencies: Requires depth map input.

  7. DownloadAndLoadDepthAnythingV2Model

    • Purpose: Downloads and loads the DepthAnythingV2 model.

    • Function: Automatically retrieves the depth model for use.

    • Installation: Requires DepthAnything plugin, install via ComfyUI Manager (search “DepthAnything”) or GitHub (https://github.com/comfyanonymous/ComfyUI_DepthAnything).

  8. DepthAnything_V2

    • Purpose: Generates depth maps from input images.

    • Function: Outputs depth images for ControlNet use.

    • Installation: Shares plugin with DownloadAndLoadDepthAnythingV2Model.

    • Dependencies: Requires depth_anything_v2_vitl_fp32.safetensors.

  9. ImageResize+

    • Purpose: Resizes input images.

    • Function: Adjusts the image to 1024x1024, maintaining proportions.

    • Installation: Built into ComfyUI.

  10. DualCLIPLoader

    • Purpose: Loads CLIP models.

    • Function: Outputs CLIP objects for text encoding.

    • Installation: Built into ComfyUI.

    • Dependencies: Requires clip_l and t5xxl_fp16 files, place in ComfyUI/models/clip/.

  11. UNETLoader

    • Purpose: Loads the Flux.1 UNET model.

    • Function: Outputs a model object to drive generation.

    • Installation: Built into ComfyUI.

    • Dependencies: Requires 基础算法_F.1_fp8_e4m3fn file.

  12. LoraLoader

    • Purpose: Loads a Lora model.

    • Function: Fine-tunes the model for hand-drawn watercolor style.

    • Installation: Built into ComfyUI.

    • Dependencies: Requires 姑苏_F.1-手绘水彩风萌宠_V1.0 file.

  13. EmptyLatentImage

    • Purpose: Creates an initial latent image.

    • Function: Provides a 1024x1024 latent space for generation.

    • Installation: Built into ComfyUI.

  14. XlabsSampler

    • Purpose: Performs sampling for generation.

    • Function: Combines model, conditioning, and ControlNet to generate latent images.

    • Installation: Requires XLabs plugin.

  15. VAEDecode

    • Purpose: Decodes latent images into pixel images.

    • Function: Outputs the generated image.

    • Installation: Built into ComfyUI.

    • Dependencies: Requires ae.sft VAE file.

  16. UpscaleModelLoader

    • Purpose: Loads an upscale model.

    • Function: Outputs an upscale model object.

    • Installation: Built into ComfyUI.

  17. ImageUpscaleWithModel

    • Purpose: Upscales the generated image.

    • Function: Increases the 1024x1024 image to a higher resolution.

    • Installation: Built into ComfyUI.

  18. SaveImage

    • Purpose: Saves the generated image.

    • Function: Outputs the file to a specified path.

    • Installation: Built into ComfyUI.

  19. Image Comparer (rgthree)

    • Purpose: Compares original and generated images.

    • Function: Offers a slide comparison mode to show input-output differences.

    • Installation: Requires rgthree plugin, install via ComfyUI Manager (search “rgthree”) or GitHub (https://github.com/rgthree/rgthree-comfy).

Workflow Structure

  1. Joy2 Reverse Prompt Group

    • Role: Generates descriptive text from the input image to optimize prompts.

    • Input Parameters: Input image (20230304185125_b966e.jpg), mode (Descriptive), length (150).

    • Output: Detailed descriptive text (e.g., panda description paragraph).

  2. Depth Control Group

    • Role: Extracts depth information and applies ControlNet guidance.

    • Input Parameters: Input image, depth model (depth_anything_v2_vitl_fp32.safetensors), ControlNet weight (0.8).

    • Output: Depth map and ControlNet conditioning.

  3. Image Generation Group

    • Role: Executes image generation and post-processing.

    • Input Parameters: Latent image (1024x1024), positive prompt (merged text), negative prompt (“Worst quality, blurry, wrong, ugly”), Lora weight (1.2), guidance scale (3.5), sampling steps (20).

    • Output: Generated image (initial 1024x1024, upscaled).

Inputs and Outputs

  • Expected Inputs:

    • Image: 20230304185125_b966e.jpg (initial resolution 979x923).

    • Resolution: 1024x1024.

    • Seed: 722511220491392 (randomizable).

    • Prompt: Dynamically generated (including “Hand-drawn watercolor illustration”).

    • Negative Prompt: “Worst quality, blurry, wrong, ugly”.

    • Lora Weight: 1.2.

    • Guidance Scale: 3.5.

    • Sampling Steps: 20.

  • Final Output:

    • High-quality artistic-style image (PNG format, upscaled beyond 1024x1024).

    • Comparison file (saved via Image Comparer).

Notes and Tips

  1. Resource Requirements: Flux.1 and Lora generation require 12GB+ VRAM; an NVIDIA GPU is recommended.

  2. Model Files: Ensure 基础算法_F.1_fp8_e4m3fn, ae.sft, and Lora files are in the correct paths, or errors will occur.

  3. Plugin Installation: Install JoyCaption, XLabs, DepthAnything, and rgthree plugins, or nodes will be unavailable.

  4. Performance Optimization: Reduce sampling steps (20→10) or resolution (1024→512) for faster generation.

  5. Compatibility: ComfyUI version should be 0.3.18 or higher, with plugins compatible with Flux.1.

  6. Input Image: Ensure 20230304185125_b966e.jpg exists in the specified path.

Example Illustration

Suppose the input image is a panda photo; the workflow will:

  • Reverse-engineer a description: “This photograph captures a large, adorable panda...”.

  • Merge prompt: “Hand-drawn watercolor illustration, This photograph...”.

  • Generate a hand-drawn watercolor-style panda image, upscaled and saved as ComfyUI.png.