Brick Complexity Extractor
LoRA fine-tune of Qwen/Qwen3.5-0.8B for query complexity classification (easy / medium / hard).
Used in the Brick LLM routing system to decide which model tier should handle a query.
Training
- Base model: Qwen3.5-0.8B
- Method: LoRA (r=16, alpha=32, dropout=0.05)
- Dataset: regolo/brick-complexity-extractor — 65K samples labeled by Qwen3.5-122B as LLM judge
- Epochs: 3, LR: 2e-4 (cosine), Batch: 32
- Hardware: NVIDIA H200 141GB, bf16
Evaluation (test set, 3841 samples)
| Class | Precision | Recall | F1 |
|---|---|---|---|
| easy | 81.3% | 80.4% | 80.8% |
| medium | 77.6% | 80.8% | 79.2% |
| hard | 72.7% | 65.1% | 68.7% |
| accuracy | 78.1% | ||
| macro avg | 77.2% | 75.4% | 76.2% |
Average confidence: 91.7%
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, torch.nn.functional as F
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B", torch_dtype=torch.bfloat16, trust_remote_code=True)
model = PeftModel.from_pretrained(base, "regolo/brick-complexity-extractor").eval().cuda()
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-0.8B", trust_remote_code=True)
# Classification via logit extraction
LABELS = ["easy", "medium", "hard"]
label_ids = {l: tokenizer.encode(l, add_special_tokens=False)[0] for l in LABELS}
messages = [
{"role": "system", "content": "<system prompt from training_metadata.json>"},
{"role": "user", "content": "Classify: Design a lock-free concurrent skip-list with MVCC"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
with torch.no_grad():
logits = model(**inputs).logits[0, -1, :]
probs = F.softmax(torch.tensor([logits[label_ids[l]] for l in LABELS], dtype=torch.float32), dim=0)
label = LABELS[probs.argmax()]
confidence = probs.max().item()
print(f"{label} ({confidence:.2%})") # hard (94.12%)
License
CC-BY-NC-4.0
- Downloads last month
- 89