Modality Router LoRA - Smart Output Modality Selection

Part of the MoM (Mixture of Models) family for vLLM Semantic Router.

A LoRA adapter fine-tuned on mmbert-32k-yarn (307M parameter ModernBERT with 32K context and 1800+ language support) that classifies user prompt intent into the appropriate response modality:

Label Description Routed To Example
AR Text-only response Autoregressive LLM (e.g., Llama, Qwen) "What is the capital of France?"
DIFFUSION Image generation Diffusion model (e.g., Flux, SDXL) "A cyberpunk city at night, neon lights"
BOTH Text + image response Both AR + Diffusion pipeline "Explain photosynthesis and show a diagram"

Usage

With PEFT (LoRA adapter)

from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load base model + LoRA adapter
base_model = AutoModelForSequenceClassification.from_pretrained(
    "llm-semantic-router/mmbert-32k-yarn", num_labels=3
)
model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert32k-modality-router-lora")
tokenizer = AutoTokenizer.from_pretrained("llm-semantic-router/mmbert32k-modality-router-lora")

# Label mapping
labels = {0: "AR", 1: "DIFFUSION", 2: "BOTH"}

# Classify prompts
prompts = [
    "What are the benefits of exercise?",
    "A serene Japanese garden with cherry blossoms, watercolor style",
    "Explain how neural networks work and generate a diagram showing the architecture",
]

model.eval()
for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    pred = torch.argmax(outputs.logits, dim=-1).item()
    probs = torch.softmax(outputs.logits, dim=-1)[0]
    print(f"Prompt: {prompt[:60]}...")
    print(f"  -> {labels[pred]} (confidence: {probs[pred]:.3f})")
    print()

Use the merged model (recommended for production)

For easier deployment without PEFT dependency, use the merged version: llm-semantic-router/mmbert32k-modality-router-merged

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="llm-semantic-router/mmbert32k-modality-router-merged",
)
result = pipe("Draw a picture of a sunset over mountains")
print(result)  # [{'label': 'DIFFUSION', 'score': 0.97}]

Model Details

Architecture

  • Base model: llm-semantic-router/mmbert-32k-yarn (307M parameters, ModernBERT + YaRN RoPE)
  • Context length: 32,768 tokens
  • Languages: 1800+ (via Gemma 2 tokenizer with 256K vocab)
  • Adaptation: LoRA (Low-Rank Adaptation) via PEFT
  • LoRA rank: 16
  • LoRA alpha: 32
  • LoRA dropout: 0.1
  • Target modules: attn.Wqkv, attn.Wo, mlp.Wi, mlp.Wo
  • Modules saved: classifier, score
  • Task type: Sequence Classification (3 classes)
  • Trainable parameters: ~3.4M (1.09% of total 310M)

Training

  • Epochs: 10
  • Batch size: 32
  • Learning rate: 2e-5
  • Weight decay: 0.15 (adaptive)
  • Loss function: Focal Loss (gamma=2.0) with inverse-frequency class weights
  • Class imbalance handling: Focal Loss + sqrt-dampened class weights + minority oversampling
  • Hardware: AMD Instinct MI300X GPU (192GB VRAM)
  • Training time: ~2 minutes

Training Data

The model is trained on a curated combination of 10 public datasets plus seed examples:

DIFFUSION class (image generation intent)

Dataset Size Description
Gustavosta/Stable-Diffusion-Prompts 80K Curated Stable Diffusion prompts
FredZhang7/stable-diffusion-prompts-2.47M 2.47M Large-scale SD prompt collection
nateraw/parti-prompts 1.6K Google Parti benchmark prompts
fal/image-generation-prompts 1K+ Diverse image generation prompts
allenai/WildChat (mined) - Real user prompts with image-generation intent

AR class (text-only intent)

Dataset Size Description
OpenAssistant/oasst2 135K Multilingual instruction conversations
tatsu-lab/alpaca 52K Stanford instruction-following
databricks/databricks-dolly-15k 15K Categorized instructions
stingning/ultrachat 1.5M Multi-turn conversations
allenai/WildChat (mined) - Real user text-only prompts

BOTH class (mixed modality intent)

Dataset Size Description
mqliu/InterleavedBench 7K+ Gold-standard interleaved text+image prompts (EMNLP 2024)
allenai/WildChat (mined) - Real user multimodal prompts
Seed examples 40+ Curated diverse domain examples

Evaluation Results

Metric Value
Accuracy 0.9686
F1 (weighted) 0.9686
Eval Loss 0.0435

Per-class Performance

Class Precision Recall F1-Score
AR 0.956 0.967 0.962
DIFFUSION 0.974 0.979 0.977
BOTH 0.983 0.951 0.967

Intended Use

This model is designed for routing LLM requests in multi-model serving systems like vLLM Semantic Router. It enables:

  • Smart Output Modality Selection: Automatically determine whether a user query needs text, image, or both
  • Automatic Paradigm Routing: Route requests to the right model backend (AR LLM vs Diffusion model)
  • Cost Optimization: Avoid sending simple text queries to expensive image generation pipelines

Out-of-Scope Use

  • Not suitable for content moderation or safety classification
  • Not designed for multi-turn conversation context (single-turn prompt classification only)
  • May have reduced accuracy for very short or ambiguous prompts

Citation

@misc{modality-router-2025,
  title={Modality Router: Smart Output Modality Selection for Multi-Model Serving},
  author={vLLM Semantic Router Team},
  year={2025},
  url={https://huggingface.co/llm-semantic-router/mmbert32k-modality-router-lora}
}

Framework Versions

  • PEFT: 0.18.1
  • Transformers: 4.57.6
  • PyTorch: 2.9.1
  • Python: 3.12
Downloads last month
79
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for llm-semantic-router/mmbert32k-modality-router-lora

Adapter
(5)
this model

Datasets used to train llm-semantic-router/mmbert32k-modality-router-lora

Evaluation results