From Photos to Masterpieces: A Workflow for Generating Stylized Images with ControlNet and LoRA
1. Workflow Overview

Purpose: Generate stylized images from input photos (e.g., model poses) using ControlNet for structure preservation and LoRAs for style (e.g., ethnic costumes, autumn forest themes).
Core Models:
基础算法_F.1
: Base text-to-image model (likely SDXL variant).FLUX.1-dev-ControlNet-Union-Pro-InstantX
: Multi-ControlNet for pose/structure control.Meta-Llama-3.1-8B-bnb-4bit
: Image captioning model (prompt reverse engineering).
2. Key Nodes
Node Name | Function | Installation | Dependencies |
---|---|---|---|
| Loads ControlNet model. | Manually place in |
|
| Applies style LoRAs (ethnic/autumn). | Place files in |
|
| Reverse-engineers prompts via Llama-3. | Install | Requires 4bit quantization libs. |
3. Workflow Groups
Reference Image Group
Input: User-uploaded photo (e.g.,
lQDPKGyzHiAGKAfNB9DNBQOwJksZaqj6fsIH2j_m_4e8AA_1283_2000.jpg
).Process: Generates depth map via
DepthAnythingV2
for ControlNet.
LoRA Group
Loads two LoRAs:
少数民族服饰_V1.0
(weight=0.2) and秋日森林_秋天女孩_V1.0
(weight=0.7).
Generation Group
Output: 1280x2000 image after latent upscaling and VAE decoding.
4. Inputs & Outputs
Inputs:
Reference image (required).
Resolution: Default 768x1024 (adjustable via
EmptyLatentImage
).Negative prompt:
"Imperfect, non-standard, poor quality"
.
Output: Stylized model image (e.g., in ethnic costume).
5. Tips & Warnings
⚠️ Errors:
Missing ControlNet/LoRA files trigger
"Missing model"
errors.Llama-3 requires ≥8GB VRAM; disable
Joy_caption
on low-end devices.
✅ Optimization:
Use
fp8_e4m3fn
precision to save VRAM.Adjust ControlNet weight (default: 0.75) in
ControlNetApplyAdvanced
.