Unlocking the Art of Guochao: A Deep Dive into Stable Diffusion Workflow

CN
ComfyUI.org
2025-03-17 10:51:00

Workflow Overview

m8cy0jjh2o5n6ki86ic1838194341cc07e28f8cc70cb354798c43df47719ee60db01a3a3f90a952a8d7 (1).jpg

This workflow is a sophisticated image generation and enhancement pipeline based on Stable Diffusion, designed to create high-quality Chinese "Guochao" style illustrations. Its main purposes are:

  1. Generating initial images using Stable Diffusion.

  2. Enhancing details via image blending (ImageBlend) and depth mapping (DepthAnything V2).

  3. Refining images with ControlNet (depth-based) and multiple sampling passes (KSampler).

  4. Upscaling the final image to a higher resolution using RealESRGAN and UltimateSDUpscale.

Core Models

  • Stable Diffusion: Core generation model, loaded with “锦绣芳华——国潮插画风_v1.0.safetensors” for Guochao-style images.

  • LoRA (国潮-插画艺术_v1.0.safetensors): Enhances the Guochao illustration style.

  • CLIP: Loaded from “CheckpointLoaderSimple” for text prompt processing.

  • VAE (vae-ft-mse-840000-ema-pruned.ckpt): Encodes/decodes images for generation optimization.

  • ControlNet (control_v11f1p_sd15_depth.pth): Controls generation using depth maps.

  • DepthAnything V2 (depth_anything_v2_vitl_fp32.safetensors): Generates depth maps for enhanced 3D effects.

  • RealESRGAN (RealESRGAN_x2.pth and RealESRGAN_x4plus_anime_6B.pth): Super-resolution models for upscaling.

Component Explanation

  1. CheckpointLoaderSimple: Loads Stable Diffusion, CLIP, and VAE.

    • Installation: Default ComfyUI node.

  2. KSamplerAdvanced: Advanced sampler for generating/refining latent images.

    • Installation: Default ComfyUI node.

  3. EmptyLatentImage: Creates an empty latent image.

    • Installation: Default ComfyUI node.

  4. CLIP Positive-Negative (WLSH): Processes positive/negative prompts.

    • Installation: Install via ComfyUI Manager (WLSH custom nodes).

  5. VAEDecode: Decodes latent images into visible images.

    • Installation: Default ComfyUI node.

  6. ImageBlend: Blends two images.

    • Installation: Default ComfyUI node.

  7. DepthAnything_V2: Generates depth maps.

    • Installation: Install via ComfyUI Manager; model from Hugging Face.

  8. ControlNetLoader and ControlNetApply: Load and apply ControlNet.

    • Installation: Default nodes; model from Civitai or Hugging Face.

  9. UpscaleModelLoader and ImageUpscaleWithModel: Load and apply upscaling models.

    • Installation: Default nodes; models from GitHub or RealESRGAN official sources.

  10. HD UltimateSDUpscale: Advanced super-resolution upscaling.

    • Installation: Install via ComfyUI Manager (UltimateSDUpscale plugin).

  11. AV_VAELoader: Loads an external VAE model.

    • Installation: Install via ComfyUI Manager (AnimateVision plugin).

Workflow Structure

  1. Initial Image Generation Group

    • Nodes: EmptyLatentImage → KSamplerAdvanced → VAEDecode

    • Role: Generates two initial images (512x288 and 512x1024).

    • Inputs: Prompts (e.g., “Guochao(style), distant mountains”), seed, sampling steps.

    • Outputs: Two initial images.

  2. Image Blending and Depth Enhancement Group

    • Nodes: ImageBlend → DepthAnything_V2 → VAEEncode

    • Role: Blends two initial images and generates a depth map, then encodes back to latent space.

    • Inputs: Two initial images, blend factor (0.38).

    • Outputs: Blended latent image and depth map.

  3. ControlNet Refinement Group

    • Nodes: ControlNetLoader → ControlNetApply → KSamplerAdvanced

    • Role: Refines the image using a depth-based ControlNet.

    • Inputs: Depth map, prompts, ControlNet strength (0.7).

    • Outputs: Refined latent image.

  4. Super-Resolution Upscaling Group

    • Nodes: UpscaleModelLoader → ImageUpscaleWithModel → HD UltimateSDUpscale

    • Role: Progressively upscales the image to a higher resolution.

    • Inputs: Refined image, upscaling models.

    • Outputs: Final high-resolution image.

Inputs and Outputs

  • Inputs:

    • Positive prompts: e.g., “Main building, clouds, sky, mountains, guochaochahua”.

    • Negative prompts: e.g., “mankind, monochrome”.

    • Resolution: Initial 512x288 and 512x1024, upscaled to higher resolution.

    • Seed: Random or specified.

  • Outputs: A high-quality Guochao-style illustration image (PNG format after upscaling).

Notes and Considerations

  • Errors: Ensure correct model paths to avoid “model not found” issues.

  • Performance: Use FP16 precision to reduce memory usage.

  • Compatibility: WLSH and UltimateSDUpscale require the latest ComfyUI version.

  • Resources: Recommend at least 12GB GPU memory; CPU execution may be slow.