Indic Conformer 600M Quantized

This repository contains a quantized version of the Indic Conformer model, a large-scale automatic speech recognition (ASR) model created for Indic languages by AI4Bharat. The original model can be found here

Benchmarks

These benchmarks were conducted on Google Colab free tier with Tesla T4 GPU for Hindi. You can use the notebooks in scripts directory to reproduce the results or compute for other languages.

Decoding Method	FP 32 WER	int8 WER	FP32 CER	int8 CER
CTC	0.1645	0.2985	0.0661	0.1698
RNNT	0.1508	0.2939	0.0642	0.149

Model Details

Model Type: Automatic Speech Recognition (ASR)
Architecture: Conformer with both CTC (Connectionist Temporal Classification) and RNNT (Recurrent Neural Network Transducer) decoder
Quantization: int8 quantization for reduced model size and faster inference
Parameters: Approximately 600 million parameters
Languages Supported: Assamese (as), Bengali (bn), Bodo (brx), Dogri (doi), Gujarati (gu), Hindi (hi), Kannada (kn), Konkani (kok), Maithili (mai), Malayalam (ml), Marathi (mr), Nepali (ne), Odia (or), Punjabi (pa), Sanskrit (sa), Santali (sat), Sindhi (sd), Tamil (ta), Telugu (te), Urdu (ur)

Intended Use

This model is intended for transcribing speech in Indic languages into text. It can be used for applications such as voice assistants, transcription services, and accessibility tools.

Usage

Use the notebook:

Installation

To use this model, simply install the helper package:

pip install indic-asr-onnx

Loading the Model

from indic_asr_onnx import IndicTranscriber

# Initialize (downloads model automatically)
transcriber = IndicTranscriber()

Inference

# Transcribe audio using CTC head
text = transcriber.transcribe_ctc("audio.wav", "hi")  # Hindi
print(text)

# Transcribe audio using RNNT head
text = transcriber.transcribe_rnnt("audio.wav", "hi")  # Hindi
print(text)

Model Files

Config Sunfolder

vocab.json: Subword vocabulary for supported languages
language_masks.json: Language-specific masks for handling multilingual inputs

ONNX Subfolder

ctc_decoder_quantized_int8.onnx: Quantized CTC decoder for connectionist temporal classification
encoder_quantized_int8.onnx: Quantized Conformer encoder for feature extraction from audio
joint_enc_quantized_int8.onnx: Quantized joint encoder component for RNN-T decoding
joint_pre_net_quantized_int8.onnx: Quantized joint pre-net for preprocessing in RNN-T
joint_pred_quantized_int8.onnx: Quantized joint predictor for RNN-T decoding
rnnt_decoder_quantized_int8.onnx: Quantized RNN-T decoder for recurrent neural network transducer
adapters/*: Language-specific quantized joint post-net adapters for each supported language (e.g., joint_post_net_hi_quantized_int8.onnx for Hindi)

Training Data

Calibration Dataset:https://www.kaggle.com/datasets/haposeiz/indicvoices-calibration-1408

The Calibration Dataset was curated from the Indic Voices Dataset.

Additional Links

GitHub: https://github.com/atharva-again/indic-asr-onnx

Contact

For questions or issues, you can either open an issue on this repository, on GitHub, or email me at atharva.verma18@gmail.com.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for atharva-again/indic-conformer-600m-quantized

Base model

ai4bharat/indic-conformer-600m-multilingual

Quantized

(2)

this model