MedASR-MLX (int8)

8-bit quantized version of MedASR-MLX โ€” 40% smaller with zero quality loss.

Google's MedASR 105M Conformer-CTC, converted to MLX and quantized to 8-bit affine (group size 64). Runs natively on Apple Silicon with 0.0% WER degradation vs the original PyTorch model.

Full precision version: ainergiz/medasr-mlx-fp16 (201 MB)

Model Details

Property Value
Base model google/medasr โ†’ ainergiz/medasr-mlx-fp16
Architecture LASR Conformer-CTC (17 layers, 512 hidden, 8 heads)
Parameters 105M
Weights 121 MB (int8, affine, group_size=64)
Quantization 8-bit affine on Linear/Embedding layers; Conv layers remain fp16
Vocab 512 tokens (SentencePiece)
Audio input 16 kHz mono, 128-bin mel spectrogram
Framework MLX (Apple Silicon native)

Performance

Benchmarked on Apple M4 Pro (24 GB), 43.8-second medical audio clip:

Metric int8 (this model) fp16 HF PyTorch (fp32)
Latency 0.16s 0.09s 0.9-1.6s
Real-Time Factor 0.004 0.002 0.02-0.04
Weights on disk 121 MB 201 MB ~421 MB
WER vs PyTorch 0.0% 0.0% โ€”
Token Agreement 100% 100% โ€”

Lossless quantization โ€” 40% smaller with identical output tokens.

Quantization Details

  • Method: mlx.nn.quantize (affine mode)
  • Bits: 8
  • Group size: 64
  • Target modules: Linear and Embedding layers (MLX default predicate)
  • Convolution layers: Remain in float16 (quantizing Conv1d destroys accuracy for this architecture)
  • Source: Quantized from artifacts/medasr-mlx-fp16

Usage

Requirements

pip install mlx numpy soundfile

You also need the model code from the MedASR-MLX repository:

git clone https://github.com/ainergiz/medasr-mlx.git
cd medasr-mlx

Transcribe audio

from model import MedASRModel
from audio_utils import load_audio_mono
import mlx.core as mx

# Load model (automatically applies int8 quantization from config)
model_dir = "artifacts/medasr-mlx-int8"  # or download from HF
model = MedASRModel.from_pretrained(model_dir)

# Load audio and run inference (same API as fp16)
audio = load_audio_mono("your_audio.wav", target_sr=16000)
# ... (see transcribe_mlx.py for full pipeline)

Full pipeline

python main.py audio/eval-consultation/clip_0001.wav

Intended Use

MedASR is designed for medical speech recognition โ€” doctor-patient conversations, clinical dictation, and medical terminology. This int8 variant is ideal when storage is constrained but you need lossless accuracy.

This model is not intended for clinical diagnosis or treatment without appropriate validation and regulatory authorization.

Limitations

  • English only
  • Optimized for medical domain speech
  • Raw output includes formatting tokens ({period}, {comma}, etc.) requiring post-processing
  • Requires Apple Silicon hardware (M1+ Mac or A17+ iPhone)

License

The use of this model is governed by the Health AI Developer Foundations Terms of Use. Source code components are licensed under Apache 2.0.

HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use.

Citation

@misc{medasr-mlx,
  title={MedASR-MLX: On-Device Medical Speech Recognition for Apple Silicon},
  author={Ali Ihsan Nergiz},
  year={2026},
  url={https://huggingface.co/ainergiz/medasr-mlx-int8}
}

Acknowledgments

Downloads last month
57
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ainergiz/medasr-mlx-int8

Base model

google/medasr
Quantized
(1)
this model

Evaluation results