Unlock Anime-Style Image Generation with Multi-LoRA Fusion and Reference Image Guidance

CN
ComfyUI.org
2025-05-27 04:52:23

1. Workflow Overview

mb61hmykf8dpepfpxfw601f2b95964aab1e1e429c1dbac912ff7a0f32283ed2df908558016b539def58.png
  • Purpose: Anime-style image generation with reference image guidance and multi-LoRA fusion.

  • Key Features:

    • Image Captioning: Joy_caption_two node extracts prompts from input image.

    • Multi-LoRA Stacking: 3 LoRAs (Dark Fantasy/Beauty CG/Ancient Style) at 0.7 strength.

    • Bilingual Support: Auto-translation via LibLibTranslate.


2. Core Models

Model Name

Function

Meta-Llama-3.1-8B

Image-to-text model for prompt generation (Joy_caption_two_load).

Stable Diffusion Flux

Base model (F.1-fp8 11G版_flux1-dev loaded by UNETLoader).

LoRA Ensemble

暗黑血统:狂战士_1.0 + F.1色孽CG唯美_V1 + F.1超清唯美风-古风篇_v1.0 (all at 0.7 strength).


3. Key Nodes

3.1 Image Input & Captioning

  • LoadImage: Loads reference image (e.g., 00059-4156590861.jpg).

  • Joy_caption_two:

    • Function: Generates English prompts via Meta-Llama.

    • Install: Requires comfyui_slk_joy_caption_two plugin (via ComfyUI Manager).

3.2 Prompt Processing

  • JoinStrings: Merges user keywords (e.g., miluo_cjsj, cloth) with auto-generated prompts.

  • LibLibTranslate: Optional English-to-Chinese translation.

3.3 Multi-LoRA Fusion

  • LoraLoaderModelOnly:

    • Chained Loading: 3 LoRAs applied sequentially (strength=0.7 via ReroutePrimitive).

    • Model Source: Place .safetensors files in models/loras.

3.4 Generation & Output

  • KSampler:

    • Settings: Euler sampler, 20 steps, CFG=3.5, random seed.

  • VAEDecode: Uses ae.sft VAE for latent decoding.


4. Workflow Structure

Group Name

Key Nodes

I/O Description

Global Control

LoadImage, PrimitiveNode

Input: Reference image, resolution (1200x1200), keywords.

LoRA Models

3x LoraLoaderModelOnly

Output: Fused model (strength=0.7).

Generation

KSamplerVAEDecode

Output: Final image (PreviewImage).


5. Inputs & Outputs

  • Inputs:

    • Reference image (JPEG/PNG).

    • Resolution: Set via EmptyLatentImage (default 1200x1200).

    • Keywords: e.g., miluo_cjsj, cloth (priority over auto-prompts).

  • Output:

    • Generated image (anime cyberpunk style, red-haired character).


6. Notes

  1. VRAM: Recommend 12GB+ GPU (multi-LoRA is resource-intensive).

  2. Dependencies:

    • Download Meta-Llama-3.1-8B and 3 LoRA models manually.

    • Place ae.sft VAE in models/vae.

  3. Troubleshooting:

    • If captioning fails, check model path in Joy_caption_two.

    • Adjust CFG (currently 3.5) if images are blurry.