Indic Conformer 600M Quantized
This repository contains a quantized version of the Indic Conformer model, a large-scale automatic speech recognition (ASR) model created for Indic languages by AI4Bharat. The original model can be found here
Benchmarks
These benchmarks were conducted on Google Colab free tier with Tesla T4 GPU for Hindi.
You can use the notebooks in scripts directory to reproduce the results or compute for other languages.
| Decoding Method | FP 32 WER | int8 WER | FP32 CER | int8 CER |
|---|---|---|---|---|
| CTC | 0.1645 | 0.2985 | 0.0661 | 0.1698 |
| RNNT | 0.1508 | 0.2939 | 0.0642 | 0.149 |
Model Details
- Model Type: Automatic Speech Recognition (ASR)
- Architecture: Conformer with both CTC (Connectionist Temporal Classification) and RNNT (Recurrent Neural Network Transducer) decoder
- Quantization: int8 quantization for reduced model size and faster inference
- Parameters: Approximately 600 million parameters
- Languages Supported: Assamese (as), Bengali (bn), Bodo (brx), Dogri (doi), Gujarati (gu), Hindi (hi), Kannada (kn), Konkani (kok), Maithili (mai), Malayalam (ml), Marathi (mr), Nepali (ne), Odia (or), Punjabi (pa), Sanskrit (sa), Santali (sat), Sindhi (sd), Tamil (ta), Telugu (te), Urdu (ur)
Intended Use
This model is intended for transcribing speech in Indic languages into text. It can be used for applications such as voice assistants, transcription services, and accessibility tools.
Usage
Installation
To use this model, simply install the helper package:
pip install indic-asr-onnx
Loading the Model
from indic_asr_onnx import IndicTranscriber
# Initialize (downloads model automatically)
transcriber = IndicTranscriber()
Inference
# Transcribe audio using CTC head
text = transcriber.transcribe_ctc("audio.wav", "hi") # Hindi
print(text)
# Transcribe audio using RNNT head
text = transcriber.transcribe_rnnt("audio.wav", "hi") # Hindi
print(text)
Model Files
Config Sunfolder
vocab.json: Subword vocabulary for supported languageslanguage_masks.json: Language-specific masks for handling multilingual inputs
ONNX Subfolder
ctc_decoder_quantized_int8.onnx: Quantized CTC decoder for connectionist temporal classificationencoder_quantized_int8.onnx: Quantized Conformer encoder for feature extraction from audiojoint_enc_quantized_int8.onnx: Quantized joint encoder component for RNN-T decodingjoint_pre_net_quantized_int8.onnx: Quantized joint pre-net for preprocessing in RNN-Tjoint_pred_quantized_int8.onnx: Quantized joint predictor for RNN-T decodingrnnt_decoder_quantized_int8.onnx: Quantized RNN-T decoder for recurrent neural network transduceradapters/*: Language-specific quantized joint post-net adapters for each supported language (e.g., joint_post_net_hi_quantized_int8.onnx for Hindi)
Training Data
Calibration Dataset:https://www.kaggle.com/datasets/haposeiz/indicvoices-calibration-1408
The Calibration Dataset was curated from the Indic Voices Dataset.
Additional Links
GitHub: https://github.com/atharva-again/indic-asr-onnx
Contact
For questions or issues, you can either open an issue on this repository, on GitHub, or email me at atharva.verma18@gmail.com.
Model tree for atharva-again/indic-conformer-600m-quantized
Base model
ai4bharat/indic-conformer-600m-multilingual