MedASR-MLX (int8)
8-bit quantized version of MedASR-MLX โ 40% smaller with zero quality loss.
Google's MedASR 105M Conformer-CTC, converted to MLX and quantized to 8-bit affine (group size 64). Runs natively on Apple Silicon with 0.0% WER degradation vs the original PyTorch model.
Full precision version:
ainergiz/medasr-mlx-fp16(201 MB)
Model Details
| Property | Value |
|---|---|
| Base model | google/medasr โ ainergiz/medasr-mlx-fp16 |
| Architecture | LASR Conformer-CTC (17 layers, 512 hidden, 8 heads) |
| Parameters | 105M |
| Weights | 121 MB (int8, affine, group_size=64) |
| Quantization | 8-bit affine on Linear/Embedding layers; Conv layers remain fp16 |
| Vocab | 512 tokens (SentencePiece) |
| Audio input | 16 kHz mono, 128-bin mel spectrogram |
| Framework | MLX (Apple Silicon native) |
Performance
Benchmarked on Apple M4 Pro (24 GB), 43.8-second medical audio clip:
| Metric | int8 (this model) | fp16 | HF PyTorch (fp32) |
|---|---|---|---|
| Latency | 0.16s | 0.09s | 0.9-1.6s |
| Real-Time Factor | 0.004 | 0.002 | 0.02-0.04 |
| Weights on disk | 121 MB | 201 MB | ~421 MB |
| WER vs PyTorch | 0.0% | 0.0% | โ |
| Token Agreement | 100% | 100% | โ |
Lossless quantization โ 40% smaller with identical output tokens.
Quantization Details
- Method:
mlx.nn.quantize(affine mode) - Bits: 8
- Group size: 64
- Target modules: Linear and Embedding layers (MLX default predicate)
- Convolution layers: Remain in float16 (quantizing Conv1d destroys accuracy for this architecture)
- Source: Quantized from
artifacts/medasr-mlx-fp16
Usage
Requirements
pip install mlx numpy soundfile
You also need the model code from the MedASR-MLX repository:
git clone https://github.com/ainergiz/medasr-mlx.git
cd medasr-mlx
Transcribe audio
from model import MedASRModel
from audio_utils import load_audio_mono
import mlx.core as mx
# Load model (automatically applies int8 quantization from config)
model_dir = "artifacts/medasr-mlx-int8" # or download from HF
model = MedASRModel.from_pretrained(model_dir)
# Load audio and run inference (same API as fp16)
audio = load_audio_mono("your_audio.wav", target_sr=16000)
# ... (see transcribe_mlx.py for full pipeline)
Full pipeline
python main.py audio/eval-consultation/clip_0001.wav
Intended Use
MedASR is designed for medical speech recognition โ doctor-patient conversations, clinical dictation, and medical terminology. This int8 variant is ideal when storage is constrained but you need lossless accuracy.
This model is not intended for clinical diagnosis or treatment without appropriate validation and regulatory authorization.
Limitations
- English only
- Optimized for medical domain speech
- Raw output includes formatting tokens (
{period},{comma}, etc.) requiring post-processing - Requires Apple Silicon hardware (M1+ Mac or A17+ iPhone)
License
The use of this model is governed by the Health AI Developer Foundations Terms of Use. Source code components are licensed under Apache 2.0.
HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use.
Citation
@misc{medasr-mlx,
title={MedASR-MLX: On-Device Medical Speech Recognition for Apple Silicon},
author={Ali Ihsan Nergiz},
year={2026},
url={https://huggingface.co/ainergiz/medasr-mlx-int8}
}
Acknowledgments
- Google Health AI Developer Foundations for the original MedASR model
- MLX team at Apple for the framework
- Downloads last month
- 57
Quantized
Model tree for ainergiz/medasr-mlx-int8
Evaluation results
- WER parity vs PyTorchself-reported0.000
- Token Agreement vs PyTorchself-reported100.000