🧱 Brick Complexity Extractor

A lightweight LoRA adapter for real-time query complexity classification

Regolo.ai · Dataset · Brick SR1 on GitHub · API Docs

Overview
The Problem: Why LLM Routing Needs Complexity Classification
Model Details
Architecture
Label Definitions
Performance
Quick Start
GGUF Quantized Models
Integration with Brick Semantic Router
Intended Uses
Limitations
Training Details
Environmental Impact
Citation
About Regolo.ai

Overview

Brick Complexity Extractor is a LoRA adapter fine-tuned on Qwen3.5-0.8B that classifies user queries into three complexity tiers: easy, medium, and hard. It is a core signal in the Brick Semantic Router, Regolo.ai's open-source multi-model routing system.

The adapter adds only ~2M trainable parameters on top of the 0.8B base model, making it fast enough to run as a pre-inference classification step with negligible latency overhead (<15ms on a single GPU).

The Problem: Why LLM Routing Needs Complexity Classification

Not all prompts are equal. A factual recall question ("What is the capital of France?") and a multi-step reasoning task ("Derive the optimal portfolio allocation given these constraints…") require fundamentally different compute budgets. Sending every query to a frontier reasoning model wastes resources; sending hard queries to a lightweight model degrades quality.

Brick solves this by routing each query to the right model tier in real time. Complexity classification is one of several routing signals (alongside keyword matching, domain detection, and reasoning-depth estimation) that Brick uses to make sub-50ms routing decisions.

Model Details

Property	Value
Model type	LoRA adapter (PEFT)
Base model	Qwen/Qwen3.5-0.8B
Trainable parameters	~2M (LoRA rank 16, alpha 32)
Total parameters	~875M (base + adapter)
Output classes	3 (`easy`, `medium`, `hard`)
Language	English
License	CC BY-NC 4.0
Developed by	Regolo.ai (Seeweb S.r.l.)
Release date	April 2026

Architecture

The adapter applies LoRA to the query and value projection matrices (q_proj, v_proj) across all attention layers of Qwen3.5-0.8B, with a classification head on top of the last hidden state.

Qwen3.5-0.8B (frozen)
    └── Attention Layers × 24
         ├── q_proj ← LoRA(r=16, α=32)
         └── v_proj ← LoRA(r=16, α=32)
    └── Last Hidden State
         └── Classification Head (3 classes)

Label Definitions

Label	Reasoning Steps	Description	Example
easy	1–2	Surface knowledge, factual recall, simple lookups	"What is the capital of Italy?"
medium	3–5	Domain familiarity, multi-step reasoning, comparison	"Compare REST and GraphQL for a mobile app backend"
hard	6+	Deep expertise, multi-constraint optimization, creative synthesis	"Design a distributed cache eviction policy that minimizes tail latency under bursty traffic"

Labels were generated by Qwen3.5-122B acting as an LLM judge on 76,831 diverse user prompts. See the dataset card for full labeling methodology.

Performance

Classification Metrics (Test Set — 3,841 samples)

Metric	Value
Accuracy	89.2%
Weighted F1	87.4%
Macro F1	85.1%

Per-Class Performance

Class	Precision	Recall	F1	Support
easy	0.92	0.94	0.93	1,057
medium	0.88	0.90	0.89	1,660
hard	0.84	0.79	0.81	519

Latency

Setup	Inference Time (p50)	Inference Time (p99)
NVIDIA A100 (bf16)	8ms	14ms
NVIDIA L4 (fp16)	12ms	22ms
CPU (Intel Xeon, fp32)	45ms	78ms

Quick Start

Installation

pip install peft transformers torch

Inference

from peft import PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load base model + adapter
base_model_id = "Qwen/Qwen3.5-0.8B"
adapter_id = "regolo/brick-complexity-extractor"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForSequenceClassification.from_pretrained(
    base_model_id, num_labels=3
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

# Classify a query
query = "Explain the difference between TCP and UDP"
inputs = tokenizer(query, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)

labels = ["easy", "medium", "hard"]
predicted = labels[outputs.logits.argmax(dim=-1).item()]
print(f"Complexity: {predicted}")
# Output: Complexity: medium

Using with vLLM (recommended for production)

# The adapter can be loaded as a LoRA module in vLLM
# See Brick SR1 documentation for full integration guide
# https://github.com/regolo-ai/brick-SR1

GGUF Quantized Models

Pre-built GGUF files are available for inference with llama.cpp, Ollama, LM Studio, vLLM, and other GGUF-compatible runtimes. Each quantization is published as a separate model:

Model	Quant	Size	BPW	Notes
brick-complexity-extractor-BF16-GGUF	BF16	1.5 GB	16.0	Full precision
brick-complexity-extractor-Q8_0-GGUF	Q8_0	775 MB	8.0	Recommended
brick-complexity-extractor-Q4_K_M-GGUF	Q4_K_M	494 MB	5.5	Best size/quality ratio

See the brick-complexity-extractor collection for all available formats.

Integration with Brick Semantic Router

Brick Complexity Extractor is designed to work as a signal within the Brick Semantic Router pipeline. In a typical deployment:

Query arrives at the Brick router endpoint
Parallel signal extraction runs complexity classification alongside keyword matching, domain detection, and reasoning estimation
Routing decision combines all signals to select the optimal model from the pool
Query forwarded to the chosen model (e.g., Qwen 7B for easy, Llama 70B for medium, Claude for hard)

# Brick router configuration example (brick-config.yaml)
signals:
  complexity:
    model: regolo/brick-complexity-extractor
    weight: 0.35
  domain:
    model: regolo/brick-domain-classifier  # coming soon
    weight: 0.25
  keyword:
    type: rule-based
    weight: 0.20
  reasoning:
    type: heuristic
    weight: 0.20

model_pools:
  easy:
    - qwen3.5-7b
    - llama-3.3-8b
  medium:
    - qwen3.5-32b
    - llama-3.3-70b
  hard:
    - claude-sonnet-4-20250514
    - deepseek-r1

Intended Uses

✅ Primary Use Cases

LLM routing: Classify query complexity to route to the optimal model tier, reducing inference cost by 30–60% compared to always-frontier routing
Reasoning budget allocation: Decide how many reasoning tokens to allocate before inference begins
Traffic shaping: Balance GPU load across model pools based on real-time complexity distribution
Cost monitoring: Track complexity distribution over time to optimize fleet sizing

⚠️ Out-of-Scope Uses

Content moderation or safety filtering — this model classifies cognitive difficulty, not content safety
Non-English queries trained on English data only; accuracy degrades significantly on other languages
Direct use as a chatbot or generative model this is a classification adapter, not a generative model

Limitations

Label noise: The training labels were generated by Qwen3.5-122B, not human annotators. While LLM-as-judge achieves high inter-annotator agreement on complexity, systematic biases may exist (e.g., overweighting mathematical content as "hard")
Class imbalance: The "hard" class represents only 13.5% of training data, which may lead to lower recall on genuinely hard queries
Domain coverage: The training set covers general-purpose user prompts. Specialized domains (medical, legal, financial) may exhibit different complexity distributions
English only: No multilingual support in this version
Adversarial robustness: The model has not been tested against adversarial prompt manipulation designed to fool the complexity classifier

Training Details

Hyperparameter	Value
Base model	Qwen/Qwen3.5-0.8B
LoRA rank (r)	16
LoRA alpha (α)	32
LoRA dropout	0.05
Target modules	q_proj, v_proj
Learning rate	2e-4
Batch size	32
Epochs	3
Optimizer	AdamW
Scheduler	Cosine with warmup (5% steps)
Max sequence length	512 tokens
Training samples	65,307
Validation samples	7,683
Test samples	3,841
Training hardware	1× NVIDIA A100 80GB
Training time	~2 hours
Framework	PyTorch + HuggingFace PEFT

Environmental Impact

Regolo.ai is committed to sustainable AI. This model was trained on GPU infrastructure powered by Seeweb's data centers in Italy, which run on certified renewable energy.

Metric	Value
Hardware	1× NVIDIA A100 80GB
Training duration	~2 hours
Estimated CO₂	< 0.5 kg CO₂eq
Energy source	Renewable (certified)
Location	Italy (EU)

Citation

@misc{regolo2026brick-complexity,
  title  = {Brick Complexity Extractor: A LoRA Adapter for Query Complexity Classification in LLM Routing},
  author = {Regolo.ai Team},
  year   = {2026},
  url    = {https://huggingface.co/regolo/brick-complexity-extractor}
}

About Regolo.ai

Regolo.ai is the EU-sovereign LLM inference platform built on Seeweb infrastructure. We provide zero-data-retention, GDPR-native AI inference for enterprises that need privacy, compliance, and performance all from European data centers powered by renewable energy.

Brick is our open-source semantic routing system that intelligently distributes queries across model pools, optimizing for cost, latency, and quality.

Website · Docs · Discord · GitHub · LinkedIn

Downloads last month: 66

Model tree for regolo/brick-complexity-extractor

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Adapter

(145)

this model

Quantizations

3 models

Dataset used to train regolo/brick-complexity-extractor

Collection including regolo/brick-complexity-extractor

brick-complexity-extractor

Collection

Query complexity classifier for LLM routing. LoRA adapter + GGUF quantizations (BF16, Q8_0, Q4_K_M) on Qwen3.5-0.8B. • 4 items • Updated Apr 14 • 2

Evaluation results

Accuracy (3-class) on brick-complexity-extractor
test set self-reported

0.890
Weighted F1 on brick-complexity-extractor
test set self-reported

0.870

regolo
/

brick-complexity-extractor