MedASR-MLX (int8)

8-bit quantized version of MedASR-MLX — 40% smaller with zero quality loss.

Google's MedASR 105M Conformer-CTC, converted to MLX and quantized to 8-bit affine (group size 64). Runs natively on Apple Silicon with 0.0% WER degradation vs the original PyTorch model.

Full precision version: ainergiz/medasr-mlx-fp16 (201 MB)

Model Details

Property	Value
Base model	google/medasr → ainergiz/medasr-mlx-fp16
Architecture	LASR Conformer-CTC (17 layers, 512 hidden, 8 heads)
Parameters	105M
Weights	121 MB (int8, affine, group_size=64)
Quantization	8-bit affine on Linear/Embedding layers; Conv layers remain fp16
Vocab	512 tokens (SentencePiece)
Audio input	16 kHz mono, 128-bin mel spectrogram
Framework	MLX (Apple Silicon native)

Performance

Benchmarked on Apple M4 Pro (24 GB), 43.8-second medical audio clip:

Metric	int8 (this model)	fp16	HF PyTorch (fp32)
Latency	0.16s	0.09s	0.9-1.6s
Real-Time Factor	0.004	0.002	0.02-0.04
Weights on disk	121 MB	201 MB	~421 MB
WER vs PyTorch	0.0%	0.0%	—
Token Agreement	100%	100%	—

Lossless quantization — 40% smaller with identical output tokens.

Quantization Details

Method: mlx.nn.quantize (affine mode)
Bits: 8
Group size: 64
Target modules: Linear and Embedding layers (MLX default predicate)
Convolution layers: Remain in float16 (quantizing Conv1d destroys accuracy for this architecture)
Source: Quantized from artifacts/medasr-mlx-fp16

Usage

Requirements

pip install mlx numpy soundfile

You also need the model code from the MedASR-MLX repository:

git clone https://github.com/ainergiz/medasr-mlx.git
cd medasr-mlx

Transcribe audio

from model import MedASRModel
from audio_utils import load_audio_mono
import mlx.core as mx

# Load model (automatically applies int8 quantization from config)
model_dir = "artifacts/medasr-mlx-int8"  # or download from HF
model = MedASRModel.from_pretrained(model_dir)

# Load audio and run inference (same API as fp16)
audio = load_audio_mono("your_audio.wav", target_sr=16000)
# ... (see transcribe_mlx.py for full pipeline)

Full pipeline

python main.py audio/eval-consultation/clip_0001.wav

Intended Use

MedASR is designed for medical speech recognition — doctor-patient conversations, clinical dictation, and medical terminology. This int8 variant is ideal when storage is constrained but you need lossless accuracy.

This model is not intended for clinical diagnosis or treatment without appropriate validation and regulatory authorization.

Limitations

English only
Optimized for medical domain speech
Raw output includes formatting tokens ({period}, {comma}, etc.) requiring post-processing
Requires Apple Silicon hardware (M1+ Mac or A17+ iPhone)

License

The use of this model is governed by the Health AI Developer Foundations Terms of Use. Source code components are licensed under Apache 2.0.

HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use.

Citation

@misc{medasr-mlx,
  title={MedASR-MLX: On-Device Medical Speech Recognition for Apple Silicon},
  author={Ali Ihsan Nergiz},
  year={2026},
  url={https://huggingface.co/ainergiz/medasr-mlx-int8}
}

Acknowledgments

Google Health AI Developer Foundations for the original MedASR model
MLX team at Apple for the framework

Downloads last month: 57

MLX

Hardware compatibility

Quantized

Model tree for ainergiz/medasr-mlx-int8

Base model

google/medasr

Finetuned

ainergiz/medasr-mlx-fp16

Quantized

(1)

this model

Evaluation results

WER parity vs PyTorch
self-reported

0.000
Token Agreement vs PyTorch
self-reported

100.000