Distil-Whisper: Optimized for Qualcomm Devices

Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.

This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime	Precision	Chipset	SDK Versions	Download
ONNX	float	Universal	QAIRT 2.42, ONNX Runtime 1.25.0	Download
QNN_DLC	float	Universal	QAIRT 2.45	Download
TFLITE	float	Universal		Download

For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

Custom weights (e.g., fine-tuned checkpoints)
Custom input shapes
Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Distil-Whisper on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.speech_recognition

Model Stats:

Model checkpoint: distil-whisper/distil-small.en
Input resolution: 80x3000 (30 seconds audio)
Max decoded sequence length: 200 tokens
Number of parameters (encoder): 166M
Model size (encoder) (float): 332 MB
Number of parameters (decoder): 211M
Model size (decoder) (float): 450MB

Performance Summary

Model	Runtime	Precision	Chipset	Inference Time (ms)	Peak Memory Range (MB)	Primary Compute Unit
decoder	ONNX	float	Snapdragon® 8 Elite Gen 5 Mobile	5.535 ms	46 - 366 MB	NPU
decoder	ONNX	float	Snapdragon® X2 Elite	5.117 ms	171 - 171 MB	NPU
decoder	ONNX	float	Snapdragon® X Elite	10.702 ms	210 - 210 MB	NPU
decoder	ONNX	float	Snapdragon® 8 Gen 3 Mobile	8.52 ms	50 - 421 MB	NPU
decoder	ONNX	float	Qualcomm® QCS8550 (Proxy)	11.595 ms	40 - 54 MB	NPU
decoder	ONNX	float	Snapdragon® 8 Elite For Galaxy Mobile	7.184 ms	14 - 470 MB	NPU
decoder	ONNX	float	Qualcomm® QCS9075	13.322 ms	40 - 85 MB	NPU
decoder	ONNX	float	Qualcomm® QCS8750	7.184 ms	14 - 470 MB	NPU
decoder	ONNX	float	Qualcomm® QCS7181	10.702 ms	210 - 210 MB	NPU
decoder	QNN_DLC	float	Snapdragon® 8 Elite Gen 5 Mobile	5.931 ms	1 - 500 MB	NPU
decoder	QNN_DLC	float	Snapdragon® X2 Elite	5.938 ms	40 - 40 MB	NPU
decoder	QNN_DLC	float	Snapdragon® X Elite	10.942 ms	40 - 40 MB	NPU
decoder	QNN_DLC	float	Snapdragon® 8 Gen 3 Mobile	8.733 ms	40 - 644 MB	NPU
decoder	QNN_DLC	float	Qualcomm® QCS8550 (Proxy)	11.447 ms	40 - 43 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA8775P	12.761 ms	29 - 520 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA8650P	12.761 ms	29 - 520 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA8255P	12.761 ms	29 - 520 MB	NPU
decoder	QNN_DLC	float	Qualcomm® QCS8450 (Proxy)	18.059 ms	36 - 338 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA8295P	14.092 ms	20 - 262 MB	NPU
decoder	QNN_DLC	float	Snapdragon® 8 Elite For Galaxy Mobile	7.308 ms	4 - 549 MB	NPU
decoder	QNN_DLC	float	Qualcomm® QCS9075	16.802 ms	40 - 86 MB	NPU
decoder	QNN_DLC	float	Qualcomm® QCS8750	7.308 ms	4 - 549 MB	NPU
decoder	QNN_DLC	float	Qualcomm® QCS7181	10.942 ms	40 - 40 MB	NPU
decoder	TFLITE	float	Snapdragon® 8 Elite Gen 5 Mobile	5.819 ms	4 - 570 MB	NPU
decoder	TFLITE	float	Snapdragon® 8 Gen 3 Mobile	8.613 ms	2 - 748 MB	NPU
decoder	TFLITE	float	Qualcomm® QCS8550 (Proxy)	11.573 ms	0 - 182 MB	NPU
decoder	TFLITE	float	Qualcomm® SA8775P	12.999 ms	5 - 538 MB	NPU
decoder	TFLITE	float	Qualcomm® SA8650P	12.999 ms	5 - 538 MB	NPU
decoder	TFLITE	float	Qualcomm® SA8255P	12.999 ms	5 - 538 MB	NPU
decoder	TFLITE	float	Qualcomm® QCS8450 (Proxy)	18.354 ms	5 - 468 MB	NPU
decoder	TFLITE	float	Qualcomm® SA8295P	13.812 ms	5 - 297 MB	NPU
decoder	TFLITE	float	Snapdragon® 8 Elite For Galaxy Mobile	7.221 ms	5 - 576 MB	NPU
decoder	TFLITE	float	Qualcomm® QCS9075	16.365 ms	0 - 265 MB	NPU
decoder	TFLITE	float	Qualcomm® QCS8750	7.221 ms	5 - 576 MB	NPU
encoder	ONNX	float	Snapdragon® 8 Elite Gen 5 Mobile	51.586 ms	17 - 772 MB	NPU
encoder	ONNX	float	Snapdragon® X2 Elite	51.711 ms	211 - 211 MB	NPU
encoder	ONNX	float	Snapdragon® X Elite	122.978 ms	182 - 182 MB	NPU
encoder	ONNX	float	Snapdragon® 8 Gen 3 Mobile	82.292 ms	83 - 1245 MB	NPU
encoder	ONNX	float	Qualcomm® QCS8550 (Proxy)	117.874 ms	0 - 195 MB	NPU
encoder	ONNX	float	Snapdragon® 8 Elite For Galaxy Mobile	60.662 ms	80 - 764 MB	NPU
encoder	ONNX	float	Qualcomm® QCS9075	151.286 ms	79 - 124 MB	NPU
encoder	ONNX	float	Qualcomm® QCS8750	60.662 ms	80 - 764 MB	NPU
encoder	ONNX	float	Qualcomm® QCS7181	122.978 ms	182 - 182 MB	NPU
encoder	QNN_DLC	float	Snapdragon® 8 Elite Gen 5 Mobile	59.574 ms	1 - 712 MB	NPU
encoder	QNN_DLC	float	Snapdragon® X2 Elite	59.563 ms	1 - 1 MB	NPU
encoder	QNN_DLC	float	Snapdragon® X Elite	139.429 ms	1 - 1 MB	NPU
encoder	QNN_DLC	float	Snapdragon® 8 Gen 3 Mobile	97.796 ms	0 - 964 MB	NPU
encoder	QNN_DLC	float	Qualcomm® QCS8550 (Proxy)	135.706 ms	2 - 4 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA8775P	153.533 ms	1 - 687 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA8650P	153.533 ms	1 - 687 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA8255P	153.533 ms	1 - 687 MB	NPU
encoder	QNN_DLC	float	Qualcomm® QCS8450 (Proxy)	268.763 ms	2 - 823 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA8295P	192.924 ms	1 - 611 MB	NPU
encoder	QNN_DLC	float	Snapdragon® 8 Elite For Galaxy Mobile	71.541 ms	1 - 691 MB	NPU
encoder	QNN_DLC	float	Qualcomm® QCS9075	170.815 ms	1 - 39 MB	NPU
encoder	QNN_DLC	float	Qualcomm® QCS8750	71.541 ms	1 - 691 MB	NPU
encoder	QNN_DLC	float	Qualcomm® QCS7181	139.429 ms	1 - 1 MB	NPU
encoder	TFLITE	float	Snapdragon® 8 Elite Gen 5 Mobile	398.067 ms	41 - 80 MB	GPU
encoder	TFLITE	float	Snapdragon® 8 Gen 3 Mobile	478.691 ms	40 - 186 MB	GPU
encoder	TFLITE	float	Qualcomm® QCS8550 (Proxy)	648.522 ms	0 - 314 MB	GPU
encoder	TFLITE	float	Qualcomm® SA8775P	1308.127 ms	29 - 73 MB	GPU
encoder	TFLITE	float	Qualcomm® SA8650P	1308.127 ms	29 - 73 MB	GPU
encoder	TFLITE	float	Qualcomm® SA8255P	1308.127 ms	29 - 73 MB	GPU
encoder	TFLITE	float	Qualcomm® QCS8450 (Proxy)	843.394 ms	40 - 192 MB	GPU
encoder	TFLITE	float	Qualcomm® SA8295P	667.144 ms	40 - 84 MB	GPU
encoder	TFLITE	float	Snapdragon® 8 Elite For Galaxy Mobile	407.522 ms	42 - 81 MB	GPU
encoder	TFLITE	float	Qualcomm® QCS9075	1270.831 ms	0 - 40 MB	GPU
encoder	TFLITE	float	Qualcomm® QCS8750	407.522 ms	42 - 81 MB	GPU

License

The license for the original implementation of Distil-Whisper can be found here.

References

Community

Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
For questions or feedback please reach out to us.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for qualcomm/Distil-Whisper

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56