Mastering Style Transfer: A Comprehensive Guide to Image Generation

ComfyUI.org

2025-03-22 05:40:54

✅ Workflow Overview

m8js4bi9md0qmo5n327641e4b7f944f6fed4cb240465c77eb0cdaca4491f02c0a0afd899c5bfcf361b.png

This workflow, titled "Generate Images in the Same Style Based on a Reference Image", aims to generate images that replicate the style, composition, and color tone of a reference image.

Its core functionalities include:

✅ Loading a base model and VAE
🎯 Incorporating LoRA models to refine style
🖼️ Using a reference image for guidance
🔥 Generating new images with the same style
⚙️ Comparing the generated image with the reference

This workflow is particularly useful for style transfer, concept art generation, and generating consistent series of images with a shared aesthetic.

🔥 Core Models

Flux Base Model
- Name: 基础算法_F.1
- Function: The base model used for generating high-quality images, supporting realistic textures and lighting.
- Loading Method: Loaded via the UNETLoader node.
LoRA Model
- Name: 锦绣风华F.1_影视级古风人像写实_v1.0
- Function: A LoRA model fine-tuned for ancient Chinese-style portraits. This model influences the generated image to reflect a cinematic, classical portrait style.
- Weight: 1.0
- Loading Method: Loaded using the LoraLoaderModelOnly node.
Florence-2 Model
- Name: microsoft/Florence-2-base
- Function: A vision-language model used for image understanding and feature extraction. This model is responsible for interpreting the reference image's style.
- Loading Method: Loaded via the Florence2ModelLoader node.

🔧 Node Explanation

Here’s a detailed explanation of the key nodes used in this workflow:

🌟 Base Model Loading

UNETLoader
- Function: Loads the base model (基础算法_F.1) for image generation.
- Parameters:
  - Model: 基础算法_F.1
  - Precision: fp8_e4m3fn
- Output: Model data for sampling.
VAELoader
- Function: Loads the VAE (Variational Autoencoder) for decoding the generated image.
- Parameters:
  - Model: ae.sft
- Output: VAE data used during the encoding/decoding process.

🎯 LoRA Model Integration

LoraLoaderModelOnly
- Function: Loads the 锦绣风华F.1 LoRA model to refine the image style.
- Parameters:
  - Model name: 锦绣风华F.1_影视级古风人像写实_v1.0
  - Weight: 1.0
- Output: Enhanced model data used by the sampler.

🖼️ Reference Image Handling

LoadImage
- Function: Loads the reference image.
- Parameters:
  - Image file: 0012.jpg
- Output: Image data for style extraction.
Florence2ModelLoader
- Function: Loads the Florence-2 model for image analysis.
- Parameters:
  - Model name: microsoft/Florence-2-base
  - Precision: fp16
- Output: Florence-2 model data.
Florence2Run
- Function: Extracts image features and generates captions from the reference image.
- Inputs:
  - Image: Reference image
  - Florence-2 model
- Outputs:
  - Caption: Automatically generated caption based on the reference image.
  - Image features: Used for conditioning.

🔥 Sampling and Generation

KSampler
- Function: Generates latent images based on the model, conditioning, and noise.
- Parameters:
  - Random seed: 333721078257758
  - Steps: 30
  - CFG Scale: 3.5
  - Sampler type: euler
  - Beta schedule: beta
  - Denoising strength: 0.68
- Output: Latent image used for final rendering.
VAEDecode
- Function: Decodes the latent image into the final image.
- Inputs:
  - Latent image
  - VAE model
- Output: Generated image.

🛠️ Image Processing

ImageResizeKJ
- Function: Resizes the image to match the reference dimensions.
- Parameters:
  - Width: 1536
  - Height: 1536
  - Algorithm: lanczos
- Output: Resized image for processing.
Image Comparer (rgthree)
- Function: Compares the generated image with the reference image.
- Mode: Side-by-side comparison with a slider.
- Output: Comparison view.

🔥 Text Encoding and Conditioning

CLIPTextEncode

Function: Encodes the text prompt into conditioning data for the generation process.
Prompt:
White architectural design, interwoven geometric shapes, minimalist style, white and gray tones, large glass windows, seamless indoor and outdoor spaces, embellished with greenery, marked with "YIYUESJ", the overall atmosphere is fashionable and upscale. Night view effect
Output: Text conditioning for the sampler.

🔍 Workflow Structure

The workflow is divided into several Groups, each serving a specific purpose:

✅ Base Model Loading
- Location: Top left
- Function: Loads the base model and VAE.
- Input: Model and VAE paths
- Output: Model data for image generation.
🎯 Reference Image Input
- Location: Bottom left
- Function: Loads and processes the reference image.
- Input: Image file
- Output: Image data and extracted features.
🔥 LoRA Model Selection
- Location: Center-left
- Function: Loads the LoRA model to influence the style.
- Input: Base model
- Output: Style-enhanced model.
🛠️ Prompt Writing
- Location: Center
- Function: Encodes the text prompt into conditioning data.
- Input: Text prompt
- Output: Conditioning data.
🖼️ Image Output
- Location: Right
- Function: Generates and saves the final image.
- Input: Latent image, VAE
- Output: Final image file.
🔥 Image Comparison
- Location: Top right
- Function: Compares the generated image with the reference image.
- Input: Two images
- Output: Side-by-side comparison.

🔑 Inputs & Outputs

Input Parameters:

Reference image (e.g., 0012.jpg)
Text prompt
Flux model and LoRA model
Random seed: 333721078257758

Output:

Generated image
Comparison image (reference vs generated)

⚠️ Tips & Considerations

✅ Model Dependencies:
- This workflow relies on Florence-2 and LoRA models. Ensure they are properly installed.
⚡ Performance Requirements:
- A NVIDIA GPU with at least 12GB VRAM is recommended for smooth performance.
- Large images consume more VRAM.
⚠️ Output Variations:
- Different random seeds produce different image results.
- Adjusting the LoRA weight fine-tunes the style similarity.
🔥 Optimization Tip:
- For consistent results, use a fixed seed and consistent image dimensions.

workflow