Unlocking the Art of Guochao: A Deep Dive into Stable Diffusion Workflow
Workflow Overview
.jpg)
This workflow is a sophisticated image generation and enhancement pipeline based on Stable Diffusion, designed to create high-quality Chinese "Guochao" style illustrations. Its main purposes are:
Generating initial images using Stable Diffusion.
Enhancing details via image blending (ImageBlend) and depth mapping (DepthAnything V2).
Refining images with ControlNet (depth-based) and multiple sampling passes (KSampler).
Upscaling the final image to a higher resolution using RealESRGAN and UltimateSDUpscale.
Core Models
Stable Diffusion: Core generation model, loaded with “锦绣芳华——国潮插画风_v1.0.safetensors” for Guochao-style images.
LoRA (国潮-插画艺术_v1.0.safetensors): Enhances the Guochao illustration style.
CLIP: Loaded from “CheckpointLoaderSimple” for text prompt processing.
VAE (vae-ft-mse-840000-ema-pruned.ckpt): Encodes/decodes images for generation optimization.
ControlNet (control_v11f1p_sd15_depth.pth): Controls generation using depth maps.
DepthAnything V2 (depth_anything_v2_vitl_fp32.safetensors): Generates depth maps for enhanced 3D effects.
RealESRGAN (RealESRGAN_x2.pth and RealESRGAN_x4plus_anime_6B.pth): Super-resolution models for upscaling.
Component Explanation
CheckpointLoaderSimple: Loads Stable Diffusion, CLIP, and VAE.
Installation: Default ComfyUI node.
KSamplerAdvanced: Advanced sampler for generating/refining latent images.
Installation: Default ComfyUI node.
EmptyLatentImage: Creates an empty latent image.
Installation: Default ComfyUI node.
CLIP Positive-Negative (WLSH): Processes positive/negative prompts.
Installation: Install via ComfyUI Manager (WLSH custom nodes).
VAEDecode: Decodes latent images into visible images.
Installation: Default ComfyUI node.
ImageBlend: Blends two images.
Installation: Default ComfyUI node.
DepthAnything_V2: Generates depth maps.
Installation: Install via ComfyUI Manager; model from Hugging Face.
ControlNetLoader and ControlNetApply: Load and apply ControlNet.
Installation: Default nodes; model from Civitai or Hugging Face.
UpscaleModelLoader and ImageUpscaleWithModel: Load and apply upscaling models.
Installation: Default nodes; models from GitHub or RealESRGAN official sources.
HD UltimateSDUpscale: Advanced super-resolution upscaling.
Installation: Install via ComfyUI Manager (UltimateSDUpscale plugin).
AV_VAELoader: Loads an external VAE model.
Installation: Install via ComfyUI Manager (AnimateVision plugin).
Workflow Structure
Initial Image Generation Group
Nodes: EmptyLatentImage → KSamplerAdvanced → VAEDecode
Role: Generates two initial images (512x288 and 512x1024).
Inputs: Prompts (e.g., “Guochao(style), distant mountains”), seed, sampling steps.
Outputs: Two initial images.
Image Blending and Depth Enhancement Group
Nodes: ImageBlend → DepthAnything_V2 → VAEEncode
Role: Blends two initial images and generates a depth map, then encodes back to latent space.
Inputs: Two initial images, blend factor (0.38).
Outputs: Blended latent image and depth map.
ControlNet Refinement Group
Nodes: ControlNetLoader → ControlNetApply → KSamplerAdvanced
Role: Refines the image using a depth-based ControlNet.
Inputs: Depth map, prompts, ControlNet strength (0.7).
Outputs: Refined latent image.
Super-Resolution Upscaling Group
Nodes: UpscaleModelLoader → ImageUpscaleWithModel → HD UltimateSDUpscale
Role: Progressively upscales the image to a higher resolution.
Inputs: Refined image, upscaling models.
Outputs: Final high-resolution image.
Inputs and Outputs
Inputs:
Positive prompts: e.g., “Main building, clouds, sky, mountains, guochaochahua”.
Negative prompts: e.g., “mankind, monochrome”.
Resolution: Initial 512x288 and 512x1024, upscaled to higher resolution.
Seed: Random or specified.
Outputs: A high-quality Guochao-style illustration image (PNG format after upscaling).
Notes and Considerations
Errors: Ensure correct model paths to avoid “model not found” issues.
Performance: Use FP16 precision to reduce memory usage.
Compatibility: WLSH and UltimateSDUpscale require the latest ComfyUI version.
Resources: Recommend at least 12GB GPU memory; CPU execution may be slow.