SuprFlow Qwen 2.5 1.5B โ€” ASR Post-Processing (4-bit MLX)

A fine-tuned Qwen 2.5 1.5B Instruct model optimized for cleaning up raw speech-to-text transcription output. Designed to run on-device on Apple Silicon Macs via MLX.

What it does

Takes messy ASR output and produces clean, formatted text:

Input Output
um so I was thinking we should um probably go with option A I was thinking we should probably go with option A.
the deadline is Thursday I mean Friday The deadline is Friday.
[noise] so the plan is to deploy tomorrow [laughter] The plan is to deploy tomorrow.
pues eh estaba pensando que deberรญamos ir con la opciรณn A Estaba pensando que deberรญamos ir con la opciรณn A.

Key features

  • Filler word removal โ€” removes um, uh, like, you know, basically, actually, and language-specific fillers across 35+ languages
  • Noise marker cleanup โ€” strips [noise], [music], [laughter] and similar markers
  • Self-correction handling โ€” "Thursday I mean Friday" โ†’ "Friday"
  • Punctuation & capitalization โ€” adds proper sentence structure
  • Technical term preservation โ€” correctly formats CI/CD, API, JSON, JWT, etc.
  • Question passthrough โ€” questions are formatted as questions, never answered
  • Mode-aware formatting โ€” supports [Style: ...] prefixes for message, email, note, and meeting formats
  • Multilingual โ€” trained on 35+ languages including CJK, Arabic, Hindi, and European languages

Performance

Benchmarked on Apple M4 (24GB):

Metric Value
Average inference time 0.26s
Model size 828 MB
Quantization 4-bit (affine, group_size=64)
Consistency (5-pass) 100% deterministic

Usage with MLX

from mlx_lm import load, generate

model, tokenizer = load("SamAmeer/suprflow-qwen25-1.5b-4bit")

messages = [
    {"role": "system", "content": "You are a text formatting tool. Your ONLY job is to clean up raw speech-to-text transcription output. You must NEVER answer questions, add opinions, or generate new content.\n\nRules:\n- Fix punctuation, capitalization, and grammar\n- Fix speech-to-text errors (e.g., homophones, misheard words)\n- Remove filler words (um, uh, you know, like, basically, actually)\n- Remove noise markers ([noise], [music], [laughter])\n- If numbered items or steps are mentioned, format as a numbered list\n- If unordered items are listed, format as bullet points\n- Keep meaning exactly the same โ€” do not add, remove, or rephrase content\n- Return ONLY the cleaned text with no explanations or commentary"},
    {"role": "user", "content": "um so like the meeting is tomorrow at 3 pm and we need to um prepare the slides"},
]

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
result = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(result)
# โ†’ The meeting is tomorrow at 3 PM and we need to prepare the slides.

Training

  • Base model: Qwen/Qwen2.5-1.5B-Instruct (4-bit)
  • Method: LoRA fine-tuning via mlx-lm on Apple Silicon
  • Dataset: 44,000+ examples combining hand-crafted multilingual data (43 languages) with curated public ASR datasets
  • Training: 1,500 iterations, best validation loss 0.489 at iteration 1,000
  • Post-training: Fused LoRA weights โ†’ dequantized to fp16 โ†’ re-quantized to 4-bit

Supported languages

English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Turkish, Polish, Swedish, Danish, Norwegian, Finnish, Ukrainian, Thai, Vietnamese, Indonesian, Tamil, Telugu, Bengali, Urdu, Greek, Czech, Romanian, Hungarian, Hebrew, Bulgarian, Croatian, Catalan

Used by

This model powers on-device text formatting in SuprFlow โ€” a macOS dictation app.

Downloads last month
28
Safetensors
Model size
0.2B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SamAmeer/suprflow-qwen25-1.5b-4bit

Base model

Qwen/Qwen2.5-1.5B
Quantized
(147)
this model