luxdelux7's picture
Update README.md
cd5b2d0 verified
|
raw
history blame contribute delete
4.09 kB
metadata
license: apache-2.0
base_model:
  - timm/tf_efficientnetv2_s.in21k_ft_in1k
  - Ultralytics/YOLO11
tags:
  - comfyui
  - object-detection
  - face-detection
  - face-segmentation
  - pytorch
  - image-segmentation

Custom-trained models for face detection and segmentation across realistic, anime, and NSFW content.

Made for the Forbidden Vision ComfyUI custom nodes

GitHub Repository
Support me on Ko-fi


🎯 Why These Models Exist

Traditional face models fail where it matters most for AI art workflows:

Problem Why It Matters
🎨 Domain-locked Existing models excel at either anime or realistic—never both
🔞 NSFW blindness Most models trained only on SFW data break on adult content
👁️‍🗨️ Detail blindness Most models miss anime eyebrows, real eyelashes etc.
🎲 Generation artifacts Standard datasets don't include diffusion model quirks and failures

These models solve all 4.

Mask Example

The segmentation model predicts face masks, stylistic eyebrows, eyelashes etc.


📊 Training Foundation

The Dataset Difference

Built from 14,000+ manually annotated images across the domains that actually matter for AI generation:

🎨 Multi-Domain Coverage

  • SDXL, SD1.5, Pony, Illustrious outputs
  • Curated Danbooru (anime styles)
  • Real photography
  • Full NSFW inclusion (no filtering)

💎 Edge Case Priority

  • ✓ Extreme angles & occlusions
  • ✓ Failed/broken generations
  • ✓ Low-quality artifacts
  • ✓ Unusual expressions & poses
  • ✓ Everything other models ignore

What This Means For You

Traditional models: Trained on clean celebrity faces
         ↓
    Fail on real workflows

These models: Trained on what you actually generate
         ↓
    Work when you need them

One model family. Every domain. Zero compromises.

Model Details

Face Detection (YOLOv11-Small)

Purpose: Primary face detection with high recall

Training Approach:

  • After every training run, I ran the model on a new mixed dataset, hardmining failures and improving the dataset until an acceptable performance was reached
  • Trained at 640px resolution (inference should use same resolution)

Why YOLOv11-Small instead of nano?
More reliable detection across mixed realistic/anime domains with acceptable speed tradeoff.


Segmentation (EfficientNet-v2)

Purpose: Precise face mask generation

Training Approach:

  • Dataset prepared using the Forbidden Vision YOLO model at 512px resolution
  • Iterative hardmine training in multiple phases:
    • Train on the initial 700 samples
    • Evaluate on remaining images to find failure cases
    • Correct failed masks and add them to the dataset
    • Retrain with the expanded dataset
    • Repeat until failure cases drop to near-zero
      (final dataset: 4k+ images)

Features:

  • Detects and includes facial features other models ignore, like protruding anime eybrows, realistic eyelashes sticking out of the face etc.
  • Glasses and similar are treated as part of the face, even if sticking outside the face shape
  • NSFW friendly across both anime, realistic and 3d domains

Usage

These models are automatically downloaded and used by the Fixer node in ComfyUI Forbidden Vision.

License

Apache 2.0


Contact