MedASR-MLX (fp16)
Google's MedASR 105M Conformer-CTC model converted to MLX for native Apple Silicon inference.
This is the first on-device deployment of MedASR. The model runs faster on an M4 Pro then HF PyTorch(fp32) β transcribing 43.8 seconds of medical audio in 0.09 seconds.
Also available:
ainergiz/medasr-mlx-int8β 8-bit quantized (121 MB, lossless)
Model Details
| Property | Value |
|---|---|
| Base model | google/medasr |
| Architecture | LASR Conformer-CTC (17 layers, 512 hidden, 8 heads) |
| Parameters | 105M |
| Weights | 201 MB (float16) |
| Vocab | 512 tokens (SentencePiece) |
| Audio input | 16 kHz mono, 128-bin mel spectrogram |
| Positional encoding | RoPE (theta=10000) |
| Convolution | Depthwise, kernel size 32 |
| Framework | MLX (Apple Silicon native) |
Performance
Benchmarked on Apple M4 Pro (24 GB), 43.8-second medical audio clip:
| Metric | MedASR MLX (fp16) | HF PyTorch (fp32) |
|---|---|---|
| Latency | 0.09s | 0.9-1.6s |
| Real-Time Factor | 0.002 | 0.02-0.04 |
| Weights on disk | 201 MB | ~421 MB |
| WER vs PyTorch | 0.000 | β |
MLX achieves 6-17x speedup over HF PyTorch.
Quantization Variants
| Precision | Weights | WER vs PyTorch | Token Agreement |
|---|---|---|---|
| fp16 (this model) | 201 MB | 0.0% | 100% |
| int8 | 121 MB | 0.0% | 100% |
Usage
Requirements
pip install mlx numpy soundfile
You also need the model code from the MedASR-MLX repository:
git clone https://github.com/ainergiz/medasr-mlx.git
cd medasr-mlx
Transcribe audio
from model import MedASRModel
from audio_utils import load_audio_mono
import mlx.core as mx
import numpy as np
import json
# Load model
model_dir = "artifacts/medasr-mlx-fp16" # or download from HF
model = MedASRModel.from_pretrained(model_dir)
# Load and preprocess audio
audio = load_audio_mono("your_audio.wav", target_sr=16000)
# Compute mel spectrogram (128-bin, 16kHz)
# ... (see transcribe_mlx.py for full feature extraction)
# Run inference
logits = model(features, mask)
# CTC greedy decode
token_ids = mx.argmax(logits, axis=-1).tolist()[0]
Full pipeline (audio to patient summary)
python main.py audio/eval-consultation/clip_0001.wav
This runs MedASR for transcription, then MedGemma 1.5 4B for a plain-language patient summary.
Architecture
The model is a 17-layer Conformer encoder with CTC output:
Audio (16kHz) β 128-bin Mel Spectrogram
β Subsampling (2Γ Conv1d stride-2, 4x total reduction)
β 17Γ Conformer Blocks:
ββ Feed-Forward (Β½ residual, SiLU, 2048 intermediate)
ββ Multi-Head Self-Attention (8 heads, 64 dim, RoPE)
ββ Depthwise Conv (kernel=32)
ββ Feed-Forward (Β½ residual)
β CTC Head (Conv1d 512β512)
β 512-token vocabulary (SentencePiece)
Weight conversion
All 368 parameter tensors and 51 BatchNorm buffers were converted from PyTorch to MLX format. Conv1d weights are transposed from PyTorch [out, in, kernel] to MLX [out, kernel, in] layout. Conversion achieves exact parity (0.0% WER, 100% token agreement).
Intended Use
MedASR is designed for medical speech recognition β doctor-patient conversations, clinical dictation, and medical terminology. It is part of Google's Health AI Developer Foundations (HAI-DEF).
This MLX conversion enables on-device inference on Apple Silicon (Mac, iPhone 15 Pro+) with no cloud dependency, making it suitable for privacy-sensitive healthcare applications.
This model is not intended for clinical diagnosis or treatment without appropriate validation and regulatory authorization.
Limitations
- English only
- Optimized for medical domain speech; general-purpose ASR accuracy may vary
- Raw output includes formatting tokens (
{period},{comma},{new paragraph}) that require post-processing - Requires Apple Silicon hardware (M1+ Mac or A17+ iPhone)
License
The use of this model is governed by the Health AI Developer Foundations Terms of Use. Source code components are licensed under Apache 2.0.
HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use.
Citation
@misc{medasr-mlx,
title={MedASR-MLX: On-Device Medical Speech Recognition for Apple Silicon},
author={Ali Ihsan Nergiz},
year={2026},
url={https://huggingface.co/ainergiz/medasr-mlx-fp16}
}
Acknowledgments
- Google Health AI Developer Foundations for the original MedASR model
- MLX team at Apple for the framework
- Downloads last month
- 49
Model tree for ainergiz/medasr-mlx-fp16
Evaluation results
- WER parity vs PyTorchself-reported0.000
- Token Agreement vs PyTorchself-reported100.000