From Reference to Reality: AI-Driven Character Style Transfer and Prompt Reverse Engineering

CN
ComfyUI.org
2025-04-16 14:46:59

Workflow Overview

m9k1obry531o40x6sylefe64a03d3fcce1900f683ee54a4462c92a36b8d8eac2b20441f59059c1fbc0f.jpg

This workflow performs character style transfer + prompt reverse engineering, with:

  1. IPAdapter + InstantID for facial feature preservation

  2. Meta-Llama-3.1-8B for automatic prompt generation

  3. niji-Anime-SDXL model for anime-style image generation

Key Models:

  • niji-动漫二次元-sdxl_2.0: Anime-style generation

  • unsloth/Meta-Llama-3.1-8B-Instruct: Prompt reverse engineering

  • ip-adapter_instant_id_sdxl: Identity preservation

Node Breakdown

  1. Joy_caption_two_load

    • Purpose: Loads Meta-Llama for prompt reverse engineering

    • Install: Manual install via ComfyUI-Custom-Scripts

    • Depends: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit (HuggingFace login required)

  2. InstantIDModelLoader

    • Purpose: Loads InstantID for facial feature binding

    • Install: Requires ComfyUI-InstantID

    • Depends: ip-adapter_instant_id_sdxl.safetensors

  3. IPAdapterAdvanced

    • Purpose: Advanced style adaptation

    • Install: Install IPAdapter Plus via ComfyUI Manager

Workflow Structure

  • Group 1: Prompt Reverse Engineering

    • Input: Reference image → Joy_caption_two

    • Output: Generated description (e.g., "A maid with golden eyes...")

  • Group 2: Facial Feature Fusion

    • Input: Source face + InstantID parameters

    • Output: Latent data with facial features

  • Group 3: Image Generation

    • Input: Reverse-engineered prompts + fused model

    • Output: Final anime-style image

Inputs & Outputs

  • Inputs:

    1. Style reference image (768x1024 PNG)

    2. Source face image (recommended 512x512)

    3. Negative prompts (pre-configured)

  • Output:

    • 768x1024 anime-style image

Notes

⚠️ Required Dependencies:

  1. ComfyUI-InstantID plugin

  2. IPAdapter Plus plugin

  3. Meta-Llama-3.1-8B (requires 16GB VRAM)

💡 Tips:

  • Use VIT-G (medium strength) for speed/quality balance

  • Reduce resolution to 512x768 if VRAM insufficient