MedASR-MLX (fp16)

Google's MedASR 105M Conformer-CTC model converted to MLX for native Apple Silicon inference.

This is the first on-device deployment of MedASR. The model runs faster on an M4 Pro then HF PyTorch(fp32) β€” transcribing 43.8 seconds of medical audio in 0.09 seconds.

Also available: ainergiz/medasr-mlx-int8 β€” 8-bit quantized (121 MB, lossless)

Model Details

Property Value
Base model google/medasr
Architecture LASR Conformer-CTC (17 layers, 512 hidden, 8 heads)
Parameters 105M
Weights 201 MB (float16)
Vocab 512 tokens (SentencePiece)
Audio input 16 kHz mono, 128-bin mel spectrogram
Positional encoding RoPE (theta=10000)
Convolution Depthwise, kernel size 32
Framework MLX (Apple Silicon native)

Performance

Benchmarked on Apple M4 Pro (24 GB), 43.8-second medical audio clip:

Metric MedASR MLX (fp16) HF PyTorch (fp32)
Latency 0.09s 0.9-1.6s
Real-Time Factor 0.002 0.02-0.04
Weights on disk 201 MB ~421 MB
WER vs PyTorch 0.000 β€”

MLX achieves 6-17x speedup over HF PyTorch.

Quantization Variants

Precision Weights WER vs PyTorch Token Agreement
fp16 (this model) 201 MB 0.0% 100%
int8 121 MB 0.0% 100%

Usage

Requirements

pip install mlx numpy soundfile

You also need the model code from the MedASR-MLX repository:

git clone https://github.com/ainergiz/medasr-mlx.git
cd medasr-mlx

Transcribe audio

from model import MedASRModel
from audio_utils import load_audio_mono
import mlx.core as mx
import numpy as np
import json

# Load model
model_dir = "artifacts/medasr-mlx-fp16"  # or download from HF
model = MedASRModel.from_pretrained(model_dir)

# Load and preprocess audio
audio = load_audio_mono("your_audio.wav", target_sr=16000)

# Compute mel spectrogram (128-bin, 16kHz)
# ... (see transcribe_mlx.py for full feature extraction)

# Run inference
logits = model(features, mask)

# CTC greedy decode
token_ids = mx.argmax(logits, axis=-1).tolist()[0]

Full pipeline (audio to patient summary)

python main.py audio/eval-consultation/clip_0001.wav

This runs MedASR for transcription, then MedGemma 1.5 4B for a plain-language patient summary.

Architecture

The model is a 17-layer Conformer encoder with CTC output:

Audio (16kHz) β†’ 128-bin Mel Spectrogram
  β†’ Subsampling (2Γ— Conv1d stride-2, 4x total reduction)
  β†’ 17Γ— Conformer Blocks:
      β”œβ”€ Feed-Forward (Β½ residual, SiLU, 2048 intermediate)
      β”œβ”€ Multi-Head Self-Attention (8 heads, 64 dim, RoPE)
      β”œβ”€ Depthwise Conv (kernel=32)
      └─ Feed-Forward (Β½ residual)
  β†’ CTC Head (Conv1d 512β†’512)
  β†’ 512-token vocabulary (SentencePiece)

Weight conversion

All 368 parameter tensors and 51 BatchNorm buffers were converted from PyTorch to MLX format. Conv1d weights are transposed from PyTorch [out, in, kernel] to MLX [out, kernel, in] layout. Conversion achieves exact parity (0.0% WER, 100% token agreement).

Intended Use

MedASR is designed for medical speech recognition β€” doctor-patient conversations, clinical dictation, and medical terminology. It is part of Google's Health AI Developer Foundations (HAI-DEF).

This MLX conversion enables on-device inference on Apple Silicon (Mac, iPhone 15 Pro+) with no cloud dependency, making it suitable for privacy-sensitive healthcare applications.

This model is not intended for clinical diagnosis or treatment without appropriate validation and regulatory authorization.

Limitations

  • English only
  • Optimized for medical domain speech; general-purpose ASR accuracy may vary
  • Raw output includes formatting tokens ({period}, {comma}, {new paragraph}) that require post-processing
  • Requires Apple Silicon hardware (M1+ Mac or A17+ iPhone)

License

The use of this model is governed by the Health AI Developer Foundations Terms of Use. Source code components are licensed under Apache 2.0.

HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use.

Citation

@misc{medasr-mlx,
  title={MedASR-MLX: On-Device Medical Speech Recognition for Apple Silicon},
  author={Ali Ihsan Nergiz},
  year={2026},
  url={https://huggingface.co/ainergiz/medasr-mlx-fp16}
}

Acknowledgments

Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ainergiz/medasr-mlx-fp16

Base model

google/medasr
Finetuned
(4)
this model
Quantizations
1 model

Evaluation results