workflow

This workflow is designed to reverse engineer prompts from reference images and generate new images using Stable Diffusion. It combines JOY Caption Two for prompt inference and FLUX with LORA models for enhanced image generation, producing high-quality images and allowing for comparison between input and output images.

🧠 Core Models

1️⃣ UNet (Stable Diffusion)

Function: The primary neural network responsible for noise removal and image generation.
Model Used: 基础算法_F.1
Installation:
- Install via ComfyUI Manager.
- Or manually download .safetensors and place it in models/checkpoints.

2️⃣ VAE (Variational Autoencoder)

Function: Enhances image quality, particularly in detail and color.
Model Used: ae.sft
Installation:
- Install via ComfyUI Manager.
- Or manually download .vae.pt and place it in models/vae.

3️⃣ CLIP (Text Encoder)

Function: Converts text prompts into vectors for image generation.
Model Used: t5xxl_fp8_e4m3fn
Installation:
- Install via ComfyUI Manager.
- Or manually download .pt files and place them in models/clip.

4️⃣ JOY Caption Two (Prompt Inference)

Function: Describes input images and suggests suitable prompts for generation.
Model Used: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
Installation:
- Requires additional JOY Caption Two plugin and Llama 3.1 model.

5️⃣ LORA (Style Enhancement)

Function: Enhances specific styles such as Chinese New Year themes or Floral Snake aesthetics.
Models Used:
- J_3D图标素材2_中国新年_V_Flux
- 趣味-F.1- | 花样美蛇_V1
Installation:
- Install via ComfyUI Manager.
- Or manually place them in models/lora.

📦 Key Components (Nodes)

Node	Function
`UNETLoader`	Loads the UNet model.
`VAELoader`	Loads the VAE model.
`DualCLIPLoader`	Loads the CLIP language model.
`LoraLoaderModelOnly`	Loads LORA models for style enhancement.
`LoadImage`	Loads the reference image.
`ImageResizeKJ`	Resizes the input image.
`Joy_caption_two_load`	Loads the JOY Caption Two model.
`Joy_caption_two`	Generates descriptive text from the input image.
`ShowText`	Displays the inferred prompt.
`CLIPTextEncode`	Converts the inferred text prompt into vector form.
`KSampler`	Handles the sampling and image generation process.
`VAEEncode`	Encodes the input image into a latent space.
`VAEDecode`	Decodes the latent space into the final image.
`SaveImage`	Saves the generated output image.
`Image Comparer (rgthree)`	Compares the input and output images.

📂 Major Workflow Groups

1️⃣ JOY Caption Two - Prompt Inference

Function: Uses JOY Caption Two to generate descriptive prompts from the input image.
Key Components:
- Joy_caption_two_load
- Joy_caption_two
- ShowText
Input: Image
Output: Descriptive text (for Stable Diffusion)

2️⃣ Base Model Loading

Function: Loads the UNet, VAE, and CLIP models.
Key Components:
- UNETLoader
- VAELoader
- DualCLIPLoader

3️⃣ Reference Image Input

Function: Loads and resizes the reference image.
Key Components:
- LoadImage
- ImageResizeKJ

4️⃣ LORA Model Selection

Function: Selects LORA models for style enhancement.
Key Components:
- LoraLoaderModelOnly

5️⃣ Prompt Inference Result Input

Function: Encodes the JOY Caption Two-generated text into vectors for Stable Diffusion.
Key Components:
- CLIPTextEncode
- ConditioningZeroOut

6️⃣ Image Generation

Function: Generates the final image using UNet and VAE.
Key Components:
- KSampler
- VAEDecode
- SaveImage

7️⃣ Image Comparison

Function: Compares the reference image with the generated image.
Key Components:
- Image Comparer (rgthree)

🔢 Inputs & Outputs

📥 Main Inputs

Reference Image (for prompt inference)
LORA Selection (for style enhancement)
Sampling Parameters:
- Seed Value (randomization control)
- Sampling Method (Euler, DPM++, etc.)
- Sampling Steps (default 25 steps)
Text Prompt (generated via JOY Caption Two)

📤 Main Outputs

Final high-quality generated image
Reverse-engineered descriptive text
Comparison between the reference and generated images

⚠️ Important Considerations

Hardware Requirements
- Requires at least 8GB GPU (12GB+ recommended).
- JOY Caption Two can be memory-intensive; consider 4-bit quantized models.
LORA Model Compatibility
- Different LORA models may affect results. Experiment with different combinations for optimal output.
Prompt Optimization
- Reverse-engineered prompts may need manual refinement for best results.
Sampling Parameters
- Lower sampling steps may lead to loss of detail (recommended 25–50 steps).
- Euler is faster, while DPM++ provides higher quality.

Revive Memories: AI-Powered Old Photo Restoration Made Easy

The Art of Revival: Using AI to Restore Historical Portraits from Paintings and Statues

Recommend

Why Learn ComfyUI?

Explore why ComfyUI stands out in AI image generation—compare integrated vs. separated products, learning costs, freedom, and more. Discover your ideal tool today!

Windows Native Installation Tutorial for ComfyUI

Learn how to install ComfyUI on Windows, optimize GPU performance, and set up Python dependencies. Start your image generation journey today!

comfyui Interface Exploration Tutorial

Learn ComfyUI 2.0's node-based interface with this tutorial. Master workflows, nodes, and connections effortlessly. Start optimizing your process today!

MimicMotion Explained: How to Use Diffusion Models for Animation in ComfyUI

Generate animated videos with MimicMotion: Transform reference images and pose sequences into seamless MP4 animations. Explore the workflow now!

Exploring Dunhuang Art Workflow

Generate stunning Dunhuang-style art with AI using ComfyUI workflow. Explore ancient aesthetics now!

Summary

Unlock AI-powered image generation with Stable Diffusion, JOY Caption Two, and FLUX. Discover how to reverse-engineer prompts from reference images and create stunning new visuals. Learn more and start creating now!

Chapter

workflow:

CustomNodes:

UNETLoader VAELoader VAEDecode...