Master Depth Control and Style Transfer with this Cutting-Edge Workflow

CN
ComfyUI.org
2025-03-25 11:16:10

1. Workflow Overview

m8oegb0bv5vvr4iug6f01.jpg

This advanced workflow specializes in precise style transfer using depth control and AI-powered image generation. It combines:

  • Depth-guided image generation

  • AI style transfer

  • Automatic prompt generation

  • High-quality output rendering

Key applications include portrait enhancement, artistic style transfer, and controlled image generation.

2. Core Models

Model

Function

Source

F.1_Depth-fp16_1.0

Base UNet for depth-aware generation

Custom

flux1-redux-dev

Style transfer model

Requires installation

HyperL-F.1-加速器-PAseer_加速FLUX_AcceleratorV3.1 (LoRA)

Generation accelerator

Requires installation

depth_anything_v2_vitl.pth

Depth estimation

ControlNet Aux

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

Prompt generation

Joy_caption_two

3. Key Components

Node

Function

Installation

DepthAnythingV2Preprocessor

Depth map generation

ComfyUI-ControlNet-Aux

StyleModelLoader

Loads style transfer model

Built-in

Joy_caption_two

AI-powered prompt generation

Custom/GitHub

InstructPixToPixConditioning

Image-to-image conditioning

Built-in

FluxGuidance

Enhanced guidance scaling

Built-in

DF_Image_scale_to_side

Smart image scaling

Derfuu's Modded Nodes

4. Workflow Structure

  1. Input Section:

    • "照片——放这里": Source image input (658×1170)

    • "风格参考图——放这里": Style reference image (452×600)

  2. Prompt Generation:

    • Uses Llama-3 to analyze images and generate descriptive prompts

  3. Depth Control:

    • Creates depth maps for controlled generation

    • Processes at 1216px resolution

  4. Style Transfer:

    • Applies style model with CLIP vision encoding

    • Uses flux1-redux-dev style model

  5. Generation:

    • 8 sampling steps with Euler method

    • 1216px output resolution

5. Input/Output

Inputs:

  • Source image (PNG/JPG)

  • Style reference image (optional)

  • Prompt (can be AI-generated or manual)

Outputs:

  • Stylized image with depth control

  • Intermediate depth maps

  • Generated prompts

6. Technical Notes

  • Requires significant VRAM (recommended 12GB+)

  • Uses fp8 precision for some models

  • Depth processing at 1216px may be memory-intensive

  • Style model applies multiply blending

7. Installation Requirements

  1. Required custom nodes:

    • ComfyUI-ControlNet-Aux (for depth processor)

    • Derfuu's Modded Nodes (for smart scaling)

    • Joy_caption_two (for prompt generation)

  2. Model downloads needed:

    • depth_anything_v2_vitl.pth

    • flux1-redux-dev style model

    • HyperL-F.1 LoRA