SuprFlow Qwen 2.5 1.5B — ASR Post-Processing (4-bit MLX)

A fine-tuned Qwen 2.5 1.5B Instruct model optimized for cleaning up raw speech-to-text transcription output. Designed to run on-device on Apple Silicon Macs via MLX.

What it does

Takes messy ASR output and produces clean, formatted text:

Input	Output
`um so I was thinking we should um probably go with option A`	`I was thinking we should probably go with option A.`
`the deadline is Thursday I mean Friday`	`The deadline is Friday.`
`[noise] so the plan is to deploy tomorrow [laughter]`	`The plan is to deploy tomorrow.`
`pues eh estaba pensando que deberíamos ir con la opción A`	`Estaba pensando que deberíamos ir con la opción A.`

Key features

Filler word removal — removes um, uh, like, you know, basically, actually, and language-specific fillers across 35+ languages
Noise marker cleanup — strips [noise], [music], [laughter] and similar markers
Self-correction handling — "Thursday I mean Friday" → "Friday"
Punctuation & capitalization — adds proper sentence structure
Technical term preservation — correctly formats CI/CD, API, JSON, JWT, etc.
Question passthrough — questions are formatted as questions, never answered
Mode-aware formatting — supports [Style: ...] prefixes for message, email, note, and meeting formats
Multilingual — trained on 35+ languages including CJK, Arabic, Hindi, and European languages

Performance

Benchmarked on Apple M4 (24GB):

Metric	Value
Average inference time	0.26s
Model size	828 MB
Quantization	4-bit (affine, group_size=64)
Consistency (5-pass)	100% deterministic

Usage with MLX

from mlx_lm import load, generate

model, tokenizer = load("SamAmeer/suprflow-qwen25-1.5b-4bit")

messages = [
    {"role": "system", "content": "You are a text formatting tool. Your ONLY job is to clean up raw speech-to-text transcription output. You must NEVER answer questions, add opinions, or generate new content.\n\nRules:\n- Fix punctuation, capitalization, and grammar\n- Fix speech-to-text errors (e.g., homophones, misheard words)\n- Remove filler words (um, uh, you know, like, basically, actually)\n- Remove noise markers ([noise], [music], [laughter])\n- If numbered items or steps are mentioned, format as a numbered list\n- If unordered items are listed, format as bullet points\n- Keep meaning exactly the same — do not add, remove, or rephrase content\n- Return ONLY the cleaned text with no explanations or commentary"},
    {"role": "user", "content": "um so like the meeting is tomorrow at 3 pm and we need to um prepare the slides"},
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
result = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(result)
# → The meeting is tomorrow at 3 PM and we need to prepare the slides.

Training

Base model: Qwen/Qwen2.5-1.5B-Instruct (4-bit)
Method: LoRA fine-tuning via mlx-lm on Apple Silicon
Dataset: 44,000+ examples combining hand-crafted multilingual data (43 languages) with curated public ASR datasets
Training: 1,500 iterations, best validation loss 0.489 at iteration 1,000
Post-training: Fused LoRA weights → dequantized to fp16 → re-quantized to 4-bit

Supported languages

English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Turkish, Polish, Swedish, Danish, Norwegian, Finnish, Ukrainian, Thai, Vietnamese, Indonesian, Tamil, Telugu, Bengali, Urdu, Greek, Czech, Romanian, Hungarian, Hebrew, Bulgarian, Croatian, Catalan

Used by

This model powers on-device text formatting in SuprFlow — a macOS dictation app.

Downloads last month: 28

Safetensors

Model size

0.2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for SamAmeer/suprflow-qwen25-1.5b-4bit

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Quantized

(147)

this model