π Transform Your Photos into Anime Masterpieces with AI!
π οΈ Workflow Overview

Purpose and Function:
This workflow automatically converts real-life photos into anime-style images. It combines upscaling models, VAE encoding/decoding, sampling, and facial enhancement to generate high-quality anime images with refined facial features and improved resolution.
Core Features:
Image Preprocessing: Loads and resizes the input image.
Anime Style Conversion: Uses custom anime-style models to transform the real image.
Facial Refinement: Enhances facial details and fixes imperfections.
Final Output: Displays and saves the high-resolution anime-style image.
π₯ Core Models
WAI_NSFW-illustrious-SDXL_v11
Function:
The primary model for transforming real-life photos into anime-style images.
Installation:
Install using ComfyUI Manager or manually:
Download the
.safetensors
file and place it in themodels/Stable-diffusion
directory.
4x-AnimeSharp (Upscaling Model)
Function:
Enhances the image resolution and sharpens details.
Installation:
Place the model file in
models/UpscaleModels
.Alternatively, install it using ComfyUI Manager.
βοΈ Nodes Explanation
LoadImage
Function:
Loads the real-life image into the workflow.
Input:
Image file path.
Output:
Image data.
UpscaleModelLoader
Function:
Loads the upscaling model used to enhance the image resolution.
Parameters:
4x-AnimeSharp
Output:
Upscale model.
ImageUpscaleWithModel
Function:
Applies the upscaling model to enlarge the image.
Input:
Image
Upscale model
Output:
Upscaled image.
VAEEncode
Function:
Encodes the image into the latent space for further processing.
Input:
Image
VAE model
Output:
Latent image data.
KSampler
Function:
Samples the latent image to generate the anime-style output.
Parameters:
Sampling method:
euler_ancestral
Sampling steps:
30
CFG scale:
0.6
Input:
Model
Positive and negative prompts
Latent image
Output:
Latent anime-style image.
VAEDecode
Function:
Decodes the latent image into a visual image.
Input:
Latent image
VAE model
Output:
Anime-style image.
CLIPTextEncode
Function:
Encodes the textual prompt into conditioning data.
Input:
Text prompt.
Output:
Conditioning data (CONDITIONING).
CLIPSetLastLayer
Function:
Adjusts the last layer of the CLIP model for better prompt guidance.
Input:
CLIP model.
Output:
Modified CLIP model.
SaveImage
Function:
Saves the final anime-style image to the local storage.
Input:
Image.
Output:
None (saves the file).
𧩠Workflow Structure
β Group 1: Image Upload
LoadImage β Loads the real-life image.
UpscaleModelLoader β Loads the 4x-AnimeSharp model.
ImageUpscaleWithModel β Applies the upscaling model to enhance image resolution.
β Group 2: Model Prompts
CLIPSetLastLayer β Adjusts the CLIP model's final layer.
CLIPTextEncode β Applies positive and negative prompts:
Positive prompts:
masterpiece, best quality, amazing quality
Negative prompts:
teeth, cleavage, (worst quality:1.65), (low quality:1.2), low resolution, watermark, dark spots, blemishes, dull eyes, wrong teeth, red teeth, bad tooth, multiple people, broken eyelashes
β Group 3: Initial Image Processing
VAEEncode β Encodes the image into the latent space.
KSampler β Samples the latent image using the anime model.
VAEDecode β Decodes the sampled latent image into the final anime-style image.
β Group 4: Facial Refinement
The workflow uses facial enhancement to fix imperfections, such as blurry or distorted facial features.
β Group 5: 4K Upscaling
scale β Applies 4K upscaling to improve image quality and resolution.
β Group 6: Final Output
SaveImage β Saves the final anime-style image to the local storage.
π₯ Inputs & Outputs
β Inputs:
Real-life image.
CLIP positive and negative prompts.
Upscaling model.
Anime-style generation model.
Sampling parameters (steps, CFG scale, etc.).
β Outputs:
High-resolution anime-style image with refined facial details.
β οΈ Considerations
Hardware Requirements:
This workflow involves multiple model loads, VAE encoding/decoding, and upscaling, requiring a GPU with at least 12GB of VRAM for optimal performance.
Resolution Limitations:
High-resolution input images (above 1600x1200) may cause VRAM overflow.
To avoid instability, limit the image size to 1600x1200 or smaller.
Compatibility:
Ensure the ComfyUI and model versions are compatible to prevent errors.
Output Quality Control:
Use negative prompts to filter unwanted artifacts, such as noise, blurriness, or facial imperfections.