SuprFlow Qwen 2.5 1.5B โ ASR Post-Processing (4-bit MLX)
A fine-tuned Qwen 2.5 1.5B Instruct model optimized for cleaning up raw speech-to-text transcription output. Designed to run on-device on Apple Silicon Macs via MLX.
What it does
Takes messy ASR output and produces clean, formatted text:
| Input | Output |
|---|---|
um so I was thinking we should um probably go with option A |
I was thinking we should probably go with option A. |
the deadline is Thursday I mean Friday |
The deadline is Friday. |
[noise] so the plan is to deploy tomorrow [laughter] |
The plan is to deploy tomorrow. |
pues eh estaba pensando que deberรญamos ir con la opciรณn A |
Estaba pensando que deberรญamos ir con la opciรณn A. |
Key features
- Filler word removal โ removes um, uh, like, you know, basically, actually, and language-specific fillers across 35+ languages
- Noise marker cleanup โ strips
[noise],[music],[laughter]and similar markers - Self-correction handling โ "Thursday I mean Friday" โ "Friday"
- Punctuation & capitalization โ adds proper sentence structure
- Technical term preservation โ correctly formats CI/CD, API, JSON, JWT, etc.
- Question passthrough โ questions are formatted as questions, never answered
- Mode-aware formatting โ supports
[Style: ...]prefixes for message, email, note, and meeting formats - Multilingual โ trained on 35+ languages including CJK, Arabic, Hindi, and European languages
Performance
Benchmarked on Apple M4 (24GB):
| Metric | Value |
|---|---|
| Average inference time | 0.26s |
| Model size | 828 MB |
| Quantization | 4-bit (affine, group_size=64) |
| Consistency (5-pass) | 100% deterministic |
Usage with MLX
from mlx_lm import load, generate
model, tokenizer = load("SamAmeer/suprflow-qwen25-1.5b-4bit")
messages = [
{"role": "system", "content": "You are a text formatting tool. Your ONLY job is to clean up raw speech-to-text transcription output. You must NEVER answer questions, add opinions, or generate new content.\n\nRules:\n- Fix punctuation, capitalization, and grammar\n- Fix speech-to-text errors (e.g., homophones, misheard words)\n- Remove filler words (um, uh, you know, like, basically, actually)\n- Remove noise markers ([noise], [music], [laughter])\n- If numbered items or steps are mentioned, format as a numbered list\n- If unordered items are listed, format as bullet points\n- Keep meaning exactly the same โ do not add, remove, or rephrase content\n- Return ONLY the cleaned text with no explanations or commentary"},
{"role": "user", "content": "um so like the meeting is tomorrow at 3 pm and we need to um prepare the slides"},
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
result = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(result)
# โ The meeting is tomorrow at 3 PM and we need to prepare the slides.
Training
- Base model: Qwen/Qwen2.5-1.5B-Instruct (4-bit)
- Method: LoRA fine-tuning via
mlx-lmon Apple Silicon - Dataset: 44,000+ examples combining hand-crafted multilingual data (43 languages) with curated public ASR datasets
- Training: 1,500 iterations, best validation loss 0.489 at iteration 1,000
- Post-training: Fused LoRA weights โ dequantized to fp16 โ re-quantized to 4-bit
Supported languages
English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Turkish, Polish, Swedish, Danish, Norwegian, Finnish, Ukrainian, Thai, Vietnamese, Indonesian, Tamil, Telugu, Bengali, Urdu, Greek, Czech, Romanian, Hungarian, Hebrew, Bulgarian, Croatian, Catalan
Used by
This model powers on-device text formatting in SuprFlow โ a macOS dictation app.
- Downloads last month
- 28
Model size
0.2B params
Tensor type
BF16
ยท
U32
ยท
Hardware compatibility
Log In
to add your hardware
4-bit