Parakeet CTC 1.1B – Persian (Farsi)

This model is a fine-tuned version of NVIDIA's Parakeet CTC 1.1B for Persian (Farsi) Automatic Speech Recognition (ASR).

The model was trained and fine-tuned using the NVIDIA NeMo toolkit.

Model Details

Architecture: Parakeet (Conformer-based) with CTC decoder
Parameters: ~1.1 Billion
Language: Persian (Farsi)
Framework: NVIDIA NeMo
Tokenizer: SentencePiece (BPE)

Usage

1. Install Dependencies

pip install nemo_toolkit['asr']

2. Inference (Transcribing Audio)

import nemo.collections.asr as nemo_asr

# Load the model from Hugging Face
model = nemo_asr.models.ASRModel.from_pretrained("MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo")

# Transcribe a single audio file
files = ["/path/to/your/audio_file.wav"]
transcriptions = model.transcribe(files)

print(transcriptions[0])

Fine-Tuning

This repository includes the unpacked tokenizer files (tokenizer.json, vocab.json, etc.) compatible with Hugging Face Transformers.

You can use these files to:

Continue fine-tuning the model
Reuse the tokenizer in other frameworks or experiments

The tokenizer is a SentencePiece BPE model with a vocabulary tailored for this Persian fine-tune.

Input / Output

Input: 16kHz mono audio (WAV or FLAC)
Output: Transcribed Persian text

Limitations

Performance may degrade with heavy background noise or overlapping speech
Accuracy depends on the domain of the training data (formal vs. conversational)

Limitations & Future Potential

Due to limited computational resources, training could not be continued further. However, early results indicate strong potential.

With access to larger datasets and extended training time, the model could potentially achieve a WER of 7–8%, comparable to strong English ASR models.

Call for Sponsors / Computational Resources

I am highly motivated to continue developing this model and improve its performance.

If you or your organization can provide GPU/TPU resources or sponsorship, please feel free to reach out.

Citation

If you use this model or tokenizer in your research, please cite:

@misc{parakeet_fa_2025,
  title        = {Persian Automatic Speech Recognition with Parakeet CTC},
  author       = {Gholizadeh, Mohammad Sadegh and Jamshidi, Pooyan},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo}},
  note         = {Training support provided by Prof. Pooyan Jamshidi (pjamshid@cse.sc.edu)}
}

Downloads last month: 75

Model tree for MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo

Base model

nvidia/parakeet-ctc-1.1b

Finetuned

(2)

this model

MohammadGholizadeh
/

parakeet-ctc-1.1b-persian.nemo