Parakeet CTC 1.1B β Persian (Farsi)
This model is a fine-tuned version of NVIDIA's Parakeet CTC 1.1B for Persian (Farsi) Automatic Speech Recognition (ASR).
The model was trained and fine-tuned using the NVIDIA NeMo toolkit.
Model Details
- Architecture: Parakeet (Conformer-based) with CTC decoder
- Parameters: ~1.1 Billion
- Language: Persian (Farsi)
- Framework: NVIDIA NeMo
- Tokenizer: SentencePiece (BPE)
Usage
1. Install Dependencies
pip install nemo_toolkit['asr']
2. Inference (Transcribing Audio)
import nemo.collections.asr as nemo_asr
# Load the model from Hugging Face
model = nemo_asr.models.ASRModel.from_pretrained("MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo")
# Transcribe a single audio file
files = ["/path/to/your/audio_file.wav"]
transcriptions = model.transcribe(files)
print(transcriptions[0])
Fine-Tuning
This repository includes the unpacked tokenizer files (tokenizer.json, vocab.json, etc.) compatible with Hugging Face Transformers.
You can use these files to:
- Continue fine-tuning the model
- Reuse the tokenizer in other frameworks or experiments
The tokenizer is a SentencePiece BPE model with a vocabulary tailored for this Persian fine-tune.
Input / Output
- Input: 16kHz mono audio (WAV or FLAC)
- Output: Transcribed Persian text
Limitations
- Performance may degrade with heavy background noise or overlapping speech
- Accuracy depends on the domain of the training data (formal vs. conversational)
Limitations & Future Potential
Due to limited computational resources, training could not be continued further. However, early results indicate strong potential.
With access to larger datasets and extended training time, the model could potentially achieve a WER of 7β8%, comparable to strong English ASR models.
Call for Sponsors / Computational Resources
I am highly motivated to continue developing this model and improve its performance.
If you or your organization can provide GPU/TPU resources or sponsorship, please feel free to reach out.
Citation
If you use this model or tokenizer in your research, please cite:
@misc{parakeet_fa_2025,
title = {Persian Automatic Speech Recognition with Parakeet CTC},
author = {Gholizadeh, Mohammad Sadegh and Jamshidi, Pooyan},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo}},
note = {Training support provided by Prof. Pooyan Jamshidi (pjamshid@cse.sc.edu)}
}
- Downloads last month
- 75
Model tree for MohammadGholizadeh/parakeet-ctc-1.1b-persian.nemo
Base model
nvidia/parakeet-ctc-1.1b