Distil-Whisper: Optimized for Qualcomm Devices

Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.

This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime Precision Chipset SDK Versions Download
ONNX float Universal QAIRT 2.42, ONNX Runtime 1.24.3 Download
QNN_DLC float Universal QAIRT 2.43 Download
TFLITE float Universal QAIRT 2.43, TFLite 2.19.1 Download

For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

  • Custom weights (e.g., fine-tuned checkpoints)
  • Custom input shapes
  • Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Distil-Whisper on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.speech_recognition

Model Stats:

  • Model checkpoint: distil-whisper/distil-small.en
  • Input resolution: 80x3000 (30 seconds audio)
  • Max decoded sequence length: 200 tokens
  • Number of parameters (encoder): 166M
  • Model size (encoder) (float): 332 MB
  • Number of parameters (decoder): 211M
  • Model size (decoder) (float): 450MB

Performance Summary

Model Runtime Precision Chipset Inference Time (ms) Peak Memory Range (MB) Primary Compute Unit
decoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 5.544 ms 51 - 375 MB NPU
decoder ONNX float Snapdragon® X2 Elite 5.102 ms 178 - 178 MB NPU
decoder ONNX float Snapdragon® X Elite 11.168 ms 178 - 178 MB NPU
decoder ONNX float Snapdragon® 8 Gen 3 Mobile 8.615 ms 0 - 364 MB NPU
decoder ONNX float Qualcomm® QCS8550 (Proxy) 11.602 ms 0 - 195 MB NPU
decoder ONNX float Qualcomm® QCS9075 13.163 ms 40 - 82 MB NPU
decoder ONNX float Snapdragon® 8 Elite For Galaxy Mobile 7.186 ms 16 - 476 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 5.544 ms 17 - 326 MB NPU
decoder QNN_DLC float Snapdragon® X2 Elite 5.615 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® X Elite 10.866 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 8.59 ms 17 - 330 MB NPU
decoder QNN_DLC float Qualcomm® QCS8275 (Proxy) 18.893 ms 30 - 246 MB NPU
decoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 11.254 ms 40 - 42 MB NPU
decoder QNN_DLC float Qualcomm® SA8775P 12.908 ms 19 - 236 MB NPU
decoder QNN_DLC float Qualcomm® QCS9075 13.082 ms 40 - 86 MB NPU
decoder QNN_DLC float Qualcomm® QCS8450 (Proxy) 18.076 ms 36 - 345 MB NPU
decoder QNN_DLC float Qualcomm® SA7255P 18.893 ms 30 - 246 MB NPU
decoder QNN_DLC float Qualcomm® SA8295P 14.217 ms 20 - 271 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 7.215 ms 5 - 447 MB NPU
decoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 5.627 ms 4 - 480 MB NPU
decoder TFLITE float Snapdragon® 8 Gen 3 Mobile 8.471 ms 4 - 556 MB NPU
decoder TFLITE float Qualcomm® QCS8275 (Proxy) 18.822 ms 5 - 450 MB NPU
decoder TFLITE float Qualcomm® QCS8550 (Proxy) 11.489 ms 5 - 7 MB NPU
decoder TFLITE float Qualcomm® SA8775P 12.895 ms 5 - 450 MB NPU
decoder TFLITE float Qualcomm® QCS9075 13.074 ms 2 - 266 MB NPU
decoder TFLITE float Qualcomm® QCS8450 (Proxy) 18.592 ms 5 - 524 MB NPU
decoder TFLITE float Qualcomm® SA7255P 18.822 ms 5 - 450 MB NPU
decoder TFLITE float Qualcomm® SA8295P 14.156 ms 5 - 443 MB NPU
decoder TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 6.994 ms 3 - 504 MB NPU
encoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 50.657 ms 77 - 836 MB NPU
encoder ONNX float Snapdragon® X2 Elite 50.82 ms 183 - 183 MB NPU
encoder ONNX float Snapdragon® X Elite 124.045 ms 182 - 182 MB NPU
encoder ONNX float Snapdragon® 8 Gen 3 Mobile 82.002 ms 86 - 1245 MB NPU
encoder ONNX float Qualcomm® QCS8550 (Proxy) 118.492 ms 0 - 197 MB NPU
encoder ONNX float Qualcomm® QCS9075 150.326 ms 79 - 83 MB NPU
encoder ONNX float Snapdragon® 8 Elite For Galaxy Mobile 60.397 ms 81 - 774 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 49.034 ms 0 - 742 MB NPU
encoder QNN_DLC float Snapdragon® X2 Elite 48.393 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® X Elite 121.663 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 82.41 ms 0 - 1046 MB NPU
encoder QNN_DLC float Qualcomm® QCS8275 (Proxy) 397.445 ms 1 - 731 MB NPU
encoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 118.837 ms 1 - 3 MB NPU
encoder QNN_DLC float Qualcomm® SA8775P 139.846 ms 1 - 733 MB NPU
encoder QNN_DLC float Qualcomm® QCS9075 152.427 ms 1 - 39 MB NPU
encoder QNN_DLC float Qualcomm® QCS8450 (Proxy) 276.295 ms 1 - 917 MB NPU
encoder QNN_DLC float Qualcomm® SA7255P 397.445 ms 1 - 731 MB NPU
encoder QNN_DLC float Qualcomm® SA8295P 207.259 ms 1 - 668 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 60.835 ms 1 - 665 MB NPU
encoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 419.256 ms 41 - 81 MB GPU
encoder TFLITE float Snapdragon® 8 Gen 3 Mobile 479.748 ms 0 - 144 MB GPU
encoder TFLITE float Qualcomm® QCS8275 (Proxy) 3116.266 ms 38 - 82 MB GPU
encoder TFLITE float Qualcomm® QCS8550 (Proxy) 655.878 ms 0 - 313 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1326.077 ms 27 - 71 MB GPU
encoder TFLITE float Qualcomm® QCS9075 1270.615 ms 0 - 40 MB GPU
encoder TFLITE float Qualcomm® QCS8450 (Proxy) 843.956 ms 39 - 193 MB GPU
encoder TFLITE float Qualcomm® SA7255P 3116.266 ms 38 - 82 MB GPU
encoder TFLITE float Qualcomm® SA8295P 667.309 ms 40 - 83 MB GPU
encoder TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 409.596 ms 41 - 80 MB GPU

License

  • The license for the original implementation of Distil-Whisper can be found here.

References

Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for qualcomm/Distil-Whisper