MedASR-MLX (fp16)

Google's MedASR 105M Conformer-CTC model converted to MLX for native Apple Silicon inference.

This is the first on-device deployment of MedASR. The model runs faster on an M4 Pro then HF PyTorch(fp32) — transcribing 43.8 seconds of medical audio in 0.09 seconds.

Also available: ainergiz/medasr-mlx-int8 — 8-bit quantized (121 MB, lossless)

Model Details

Property	Value
Base model	google/medasr
Architecture	LASR Conformer-CTC (17 layers, 512 hidden, 8 heads)
Parameters	105M
Weights	201 MB (float16)
Vocab	512 tokens (SentencePiece)
Audio input	16 kHz mono, 128-bin mel spectrogram
Positional encoding	RoPE (theta=10000)
Convolution	Depthwise, kernel size 32
Framework	MLX (Apple Silicon native)

Performance

Benchmarked on Apple M4 Pro (24 GB), 43.8-second medical audio clip:

Metric	MedASR MLX (fp16)	HF PyTorch (fp32)
Latency	0.09s	0.9-1.6s
Real-Time Factor	0.002	0.02-0.04
Weights on disk	201 MB	~421 MB
WER vs PyTorch	0.000	—

MLX achieves 6-17x speedup over HF PyTorch.

Quantization Variants

Precision	Weights	WER vs PyTorch	Token Agreement
fp16 (this model)	201 MB	0.0%	100%
int8	121 MB	0.0%	100%

Usage

Requirements

pip install mlx numpy soundfile

You also need the model code from the MedASR-MLX repository:

git clone https://github.com/ainergiz/medasr-mlx.git
cd medasr-mlx

Transcribe audio

from model import MedASRModel
from audio_utils import load_audio_mono
import mlx.core as mx
import numpy as np
import json

# Load model
model_dir = "artifacts/medasr-mlx-fp16"  # or download from HF
model = MedASRModel.from_pretrained(model_dir)

# Load and preprocess audio
audio = load_audio_mono("your_audio.wav", target_sr=16000)

# Compute mel spectrogram (128-bin, 16kHz)
# ... (see transcribe_mlx.py for full feature extraction)

# Run inference
logits = model(features, mask)

# CTC greedy decode
token_ids = mx.argmax(logits, axis=-1).tolist()[0]

Full pipeline (audio to patient summary)

python main.py audio/eval-consultation/clip_0001.wav

This runs MedASR for transcription, then MedGemma 1.5 4B for a plain-language patient summary.

Architecture

The model is a 17-layer Conformer encoder with CTC output:

Audio (16kHz) → 128-bin Mel Spectrogram
  → Subsampling (2× Conv1d stride-2, 4x total reduction)
  → 17× Conformer Blocks:
      ├─ Feed-Forward (½ residual, SiLU, 2048 intermediate)
      ├─ Multi-Head Self-Attention (8 heads, 64 dim, RoPE)
      ├─ Depthwise Conv (kernel=32)
      └─ Feed-Forward (½ residual)
  → CTC Head (Conv1d 512→512)
  → 512-token vocabulary (SentencePiece)

Weight conversion

All 368 parameter tensors and 51 BatchNorm buffers were converted from PyTorch to MLX format. Conv1d weights are transposed from PyTorch [out, in, kernel] to MLX [out, kernel, in] layout. Conversion achieves exact parity (0.0% WER, 100% token agreement).

Intended Use

MedASR is designed for medical speech recognition — doctor-patient conversations, clinical dictation, and medical terminology. It is part of Google's Health AI Developer Foundations (HAI-DEF).

This MLX conversion enables on-device inference on Apple Silicon (Mac, iPhone 15 Pro+) with no cloud dependency, making it suitable for privacy-sensitive healthcare applications.

This model is not intended for clinical diagnosis or treatment without appropriate validation and regulatory authorization.

Limitations

English only
Optimized for medical domain speech; general-purpose ASR accuracy may vary
Raw output includes formatting tokens ({period}, {comma}, {new paragraph}) that require post-processing
Requires Apple Silicon hardware (M1+ Mac or A17+ iPhone)

License

The use of this model is governed by the Health AI Developer Foundations Terms of Use. Source code components are licensed under Apache 2.0.

HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use.

Citation

@misc{medasr-mlx,
  title={MedASR-MLX: On-Device Medical Speech Recognition for Apple Silicon},
  author={Ali Ihsan Nergiz},
  year={2026},
  url={https://huggingface.co/ainergiz/medasr-mlx-fp16}
}

Acknowledgments

Google Health AI Developer Foundations for the original MedASR model
MLX team at Apple for the framework

Downloads last month: 49

Model tree for ainergiz/medasr-mlx-fp16

Base model

google/medasr

Finetuned

(4)

this model

Quantizations

1 model

Evaluation results

WER parity vs PyTorch
self-reported

0.000
Token Agreement vs PyTorch
self-reported

100.000