Distil-Whisper: Optimized for Qualcomm Devices

Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.

This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime Precision Chipset SDK Versions Download
ONNX float Universal QAIRT 2.42, ONNX Runtime 1.25.0 Download
QNN_DLC float Universal QAIRT 2.45 Download
TFLITE float Universal Download

For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

  • Custom weights (e.g., fine-tuned checkpoints)
  • Custom input shapes
  • Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Distil-Whisper on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.speech_recognition

Model Stats:

  • Model checkpoint: distil-whisper/distil-small.en
  • Input resolution: 80x3000 (30 seconds audio)
  • Max decoded sequence length: 200 tokens
  • Number of parameters (encoder): 166M
  • Model size (encoder) (float): 332 MB
  • Number of parameters (decoder): 211M
  • Model size (decoder) (float): 450MB

Performance Summary

Model Runtime Precision Chipset Inference Time (ms) Peak Memory Range (MB) Primary Compute Unit
decoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 5.535 ms 46 - 366 MB NPU
decoder ONNX float Snapdragon® X2 Elite 5.117 ms 171 - 171 MB NPU
decoder ONNX float Snapdragon® X Elite 10.702 ms 210 - 210 MB NPU
decoder ONNX float Snapdragon® 8 Gen 3 Mobile 8.52 ms 50 - 421 MB NPU
decoder ONNX float Qualcomm® QCS8550 (Proxy) 11.595 ms 40 - 54 MB NPU
decoder ONNX float Snapdragon® 8 Elite For Galaxy Mobile 7.184 ms 14 - 470 MB NPU
decoder ONNX float Qualcomm® QCS9075 13.322 ms 40 - 85 MB NPU
decoder ONNX float Qualcomm® QCS8750 7.184 ms 14 - 470 MB NPU
decoder ONNX float Qualcomm® QCS7181 10.702 ms 210 - 210 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 5.931 ms 1 - 500 MB NPU
decoder QNN_DLC float Snapdragon® X2 Elite 5.938 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® X Elite 10.942 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 8.733 ms 40 - 644 MB NPU
decoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 11.447 ms 40 - 43 MB NPU
decoder QNN_DLC float Qualcomm® SA8775P 12.761 ms 29 - 520 MB NPU
decoder QNN_DLC float Qualcomm® SA8650P 12.761 ms 29 - 520 MB NPU
decoder QNN_DLC float Qualcomm® SA8255P 12.761 ms 29 - 520 MB NPU
decoder QNN_DLC float Qualcomm® QCS8450 (Proxy) 18.059 ms 36 - 338 MB NPU
decoder QNN_DLC float Qualcomm® SA8295P 14.092 ms 20 - 262 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 7.308 ms 4 - 549 MB NPU
decoder QNN_DLC float Qualcomm® QCS9075 16.802 ms 40 - 86 MB NPU
decoder QNN_DLC float Qualcomm® QCS8750 7.308 ms 4 - 549 MB NPU
decoder QNN_DLC float Qualcomm® QCS7181 10.942 ms 40 - 40 MB NPU
decoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 5.819 ms 4 - 570 MB NPU
decoder TFLITE float Snapdragon® 8 Gen 3 Mobile 8.613 ms 2 - 748 MB NPU
decoder TFLITE float Qualcomm® QCS8550 (Proxy) 11.573 ms 0 - 182 MB NPU
decoder TFLITE float Qualcomm® SA8775P 12.999 ms 5 - 538 MB NPU
decoder TFLITE float Qualcomm® SA8650P 12.999 ms 5 - 538 MB NPU
decoder TFLITE float Qualcomm® SA8255P 12.999 ms 5 - 538 MB NPU
decoder TFLITE float Qualcomm® QCS8450 (Proxy) 18.354 ms 5 - 468 MB NPU
decoder TFLITE float Qualcomm® SA8295P 13.812 ms 5 - 297 MB NPU
decoder TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 7.221 ms 5 - 576 MB NPU
decoder TFLITE float Qualcomm® QCS9075 16.365 ms 0 - 265 MB NPU
decoder TFLITE float Qualcomm® QCS8750 7.221 ms 5 - 576 MB NPU
encoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 51.586 ms 17 - 772 MB NPU
encoder ONNX float Snapdragon® X2 Elite 51.711 ms 211 - 211 MB NPU
encoder ONNX float Snapdragon® X Elite 122.978 ms 182 - 182 MB NPU
encoder ONNX float Snapdragon® 8 Gen 3 Mobile 82.292 ms 83 - 1245 MB NPU
encoder ONNX float Qualcomm® QCS8550 (Proxy) 117.874 ms 0 - 195 MB NPU
encoder ONNX float Snapdragon® 8 Elite For Galaxy Mobile 60.662 ms 80 - 764 MB NPU
encoder ONNX float Qualcomm® QCS9075 151.286 ms 79 - 124 MB NPU
encoder ONNX float Qualcomm® QCS8750 60.662 ms 80 - 764 MB NPU
encoder ONNX float Qualcomm® QCS7181 122.978 ms 182 - 182 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 59.574 ms 1 - 712 MB NPU
encoder QNN_DLC float Snapdragon® X2 Elite 59.563 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® X Elite 139.429 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 97.796 ms 0 - 964 MB NPU
encoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 135.706 ms 2 - 4 MB NPU
encoder QNN_DLC float Qualcomm® SA8775P 153.533 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® SA8650P 153.533 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® SA8255P 153.533 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® QCS8450 (Proxy) 268.763 ms 2 - 823 MB NPU
encoder QNN_DLC float Qualcomm® SA8295P 192.924 ms 1 - 611 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 71.541 ms 1 - 691 MB NPU
encoder QNN_DLC float Qualcomm® QCS9075 170.815 ms 1 - 39 MB NPU
encoder QNN_DLC float Qualcomm® QCS8750 71.541 ms 1 - 691 MB NPU
encoder QNN_DLC float Qualcomm® QCS7181 139.429 ms 1 - 1 MB NPU
encoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 398.067 ms 41 - 80 MB GPU
encoder TFLITE float Snapdragon® 8 Gen 3 Mobile 478.691 ms 40 - 186 MB GPU
encoder TFLITE float Qualcomm® QCS8550 (Proxy) 648.522 ms 0 - 314 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1308.127 ms 29 - 73 MB GPU
encoder TFLITE float Qualcomm® SA8650P 1308.127 ms 29 - 73 MB GPU
encoder TFLITE float Qualcomm® SA8255P 1308.127 ms 29 - 73 MB GPU
encoder TFLITE float Qualcomm® QCS8450 (Proxy) 843.394 ms 40 - 192 MB GPU
encoder TFLITE float Qualcomm® SA8295P 667.144 ms 40 - 84 MB GPU
encoder TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 407.522 ms 42 - 81 MB GPU
encoder TFLITE float Qualcomm® QCS9075 1270.831 ms 0 - 40 MB GPU
encoder TFLITE float Qualcomm® QCS8750 407.522 ms 42 - 81 MB GPU

License

  • The license for the original implementation of Distil-Whisper can be found here.

References

Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for qualcomm/Distil-Whisper