ForbiddenVision_Models / README.md

luxdelux7

Update README.md

cd5b2d0 verified about 1 month ago

preview code

raw

history blame contribute delete

4.09 kB

metadata

license: apache-2.0
base_model:
  - timm/tf_efficientnetv2_s.in21k_ft_in1k
  - Ultralytics/YOLO11
tags:
  - comfyui
  - object-detection
  - face-detection
  - face-segmentation
  - pytorch
  - image-segmentation

Custom-trained models for face detection and segmentation across realistic, anime, and NSFW content.

Made for the Forbidden Vision ComfyUI custom nodes

GitHub Repository

🎯 Why These Models Exist

Traditional face models fail where it matters most for AI art workflows:

Problem	Why It Matters
🎨 Domain-locked	Existing models excel at either anime or realistic—never both
🔞 NSFW blindness	Most models trained only on SFW data break on adult content
👁️‍🗨️ Detail blindness	Most models miss anime eyebrows, real eyelashes etc.
🎲 Generation artifacts	Standard datasets don't include diffusion model quirks and failures

These models solve all 4.

The segmentation model predicts face masks, stylistic eyebrows, eyelashes etc.

📊 Training Foundation

The Dataset Difference

Built from 14,000+ manually annotated images across the domains that actually matter for AI generation:

🎨 Multi-Domain Coverage

SDXL, SD1.5, Pony, Illustrious outputs
Curated Danbooru (anime styles)
Real photography
Full NSFW inclusion (no filtering)

💎 Edge Case Priority

✓ Extreme angles & occlusions
✓ Failed/broken generations
✓ Low-quality artifacts
✓ Unusual expressions & poses
✓ Everything other models ignore

What This Means For You

Traditional models: Trained on clean celebrity faces
         ↓
    Fail on real workflows

These models: Trained on what you actually generate
         ↓
    Work when you need them

One model family. Every domain. Zero compromises.

Model Details

Face Detection (YOLOv11-Small)

Purpose: Primary face detection with high recall

Training Approach:

After every training run, I ran the model on a new mixed dataset, hardmining failures and improving the dataset until an acceptable performance was reached
Trained at 640px resolution (inference should use same resolution)

Why YOLOv11-Small instead of nano?
More reliable detection across mixed realistic/anime domains with acceptable speed tradeoff.

Segmentation (EfficientNet-v2)

Purpose: Precise face mask generation

Training Approach:

Dataset prepared using the Forbidden Vision YOLO model at 512px resolution
Iterative hardmine training in multiple phases:
- Train on the initial 700 samples
- Evaluate on remaining images to find failure cases
- Correct failed masks and add them to the dataset
- Retrain with the expanded dataset
- Repeat until failure cases drop to near-zero
  (final dataset: 4k+ images)

Features:

Detects and includes facial features other models ignore, like protruding anime eybrows, realistic eyelashes sticking out of the face etc.
Glasses and similar are treated as part of the face, even if sticking outside the face shape
NSFW friendly across both anime, realistic and 3d domains

Usage

These models are automatically downloaded and used by the Fixer node in ComfyUI Forbidden Vision.

License

Apache 2.0

Contact

GitHub: ComfyUI-Forbidden-Vision
Issues: GitHub Issues
Support: Ko-fi