Parakeet CTC 1.1B – Persian (Farsi)

This model is a fine-tuned version of NVIDIA's Parakeet CTC 1.1B for Persian (Farsi) Automatic Speech Recognition (ASR).

The model was trained and fine-tuned using the NVIDIA NeMo toolkit.

Model Details

  • Architecture: Parakeet (Conformer-based) with CTC decoder
  • Parameters: ~1.1 Billion
  • Language: Persian (Farsi)
  • Framework: NVIDIA NeMo
  • Tokenizer: SentencePiece (BPE)

Usage

1. Install Dependencies

pip install nemo_toolkit['asr']

2. Inference (Transcribing Audio)

import nemo.collections.asr as nemo_asr

# Load the model from Hugging Face
model = nemo_asr.models.ASRModel.from_pretrained("MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo")

# Transcribe a single audio file
files = ["/path/to/your/audio_file.wav"]
transcriptions = model.transcribe(files)

print(transcriptions[0])

Fine-Tuning

This repository includes the unpacked tokenizer files (tokenizer.json, vocab.json, etc.) compatible with Hugging Face Transformers.

You can use these files to:

  • Continue fine-tuning the model
  • Reuse the tokenizer in other frameworks or experiments

The tokenizer is a SentencePiece BPE model with a vocabulary tailored for this Persian fine-tune.

Input / Output

  • Input: 16kHz mono audio (WAV or FLAC)
  • Output: Transcribed Persian text

Limitations

  • Performance may degrade with heavy background noise or overlapping speech
  • Accuracy depends on the domain of the training data (formal vs. conversational)

Limitations & Future Potential

Due to limited computational resources, training could not be continued further. However, early results indicate strong potential.

With access to larger datasets and extended training time, the model could potentially achieve a WER of 7–8%, comparable to strong English ASR models.

Call for Sponsors / Computational Resources

I am highly motivated to continue developing this model and improve its performance.

If you or your organization can provide GPU/TPU resources or sponsorship, please feel free to reach out.

Citation

If you use this model or tokenizer in your research, please cite:

@misc{parakeet_fa_2025,
  title        = {Persian Automatic Speech Recognition with Parakeet CTC},
  author       = {Gholizadeh, Mohammad Sadegh and Jamshidi, Pooyan},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo}},
  note         = {Training support provided by Prof. Pooyan Jamshidi (pjamshid@cse.sc.edu)}
}
Downloads last month
75
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo

Finetuned
(2)
this model

Space using MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo 1