Urdu-Punjabi Language Learning Model V2
This is a fine-tuned XLM-RoBERTa model for:
- Answer Scoring: Evaluating user responses in Urdu and Pakistani Punjabi
- Grammar Checking: Validating sentence structure
- Translation Validation: Checking translation accuracy
Model Details
- Base Model: xlm-roberta-base
- Languages: Urdu (Nastaliq), Pakistani Punjabi (Shahmukhi), English
- Task: Regression (score 0.0 to 1.0)
- Fine-tuned on: Custom vocabulary dataset (1000 words)
Dataset
The model was trained on a comprehensive custom dataset with:
- 20 Chapters (10 Urdu + 10 Punjabi)
- 50 words per chapter = 1000 vocabulary items
- 100 Quiz MCQ Questions (5 per chapter)
- 100 User Input Questions (5 per chapter)
- 1000+ Grammar Examples (sentence-translation pairs)
Chapter Topics:
- Greetings & Polite Expressions (سلام و آداب)
- Family & Relationships (خاندان اور رشتے)
- Food & Dining (کھانا اور خوراک)
- Numbers & Counting (گنتی اور اعداد)
- Places & Locations (جگہیں اور مقامات)
- Shopping & Money (خریداری اور پیسے)
- Emotions & Feelings (جذبات اور احساسات)
- Weather & Nature (موسم اور فطرت)
- Body Parts & Health (جسم کے اعضاء اور صحت)
- Education & Learning (تعلیم اور سیکھنا)
Usage
Python
from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
import torch
# Load model
model_name = "RAFAY-484/Urdu-Punjabi-V2"
tokenizer = XLMRobertaTokenizer.from_pretrained(model_name)
model = XLMRobertaForSequenceClassification.from_pretrained(model_name)
# Score an answer
expected = "خوش" # Happy in Urdu/Punjabi
user_input = "خوش"
inputs = tokenizer(expected, user_input, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
score = torch.sigmoid(outputs.logits).item()
print(f"Score: {score:.3f}") # Output: Score: 0.95+
API Usage (Flutter/Mobile App)
Future<double> scoreAnswer(String expected, String userInput) async {
final response = await http.post(
Uri.parse('https://api-inference.huggingface.co/models/RAFAY-484/Urdu-Punjabi-V2'),
headers: {
'Authorization': 'Bearer YOUR_HF_TOKEN',
'Content-Type': 'application/json',
},
body: jsonEncode({
'inputs': '$expected [SEP] $userInput',
}),
);
final result = jsonDecode(response.body);
return result[0]['score'];
}
Scoring Guide
| Score Range | Meaning | Example |
|---|---|---|
| 0.9 - 1.0 | Exact match | Expected: خوش, User: خوش |
| 0.7 - 0.9 | Close match | Expected: السلام علیکم, User: السلام |
| 0.4 - 0.7 | Partial match | Expected: شکریہ, User: شکر |
| 0.2 - 0.4 | Related word | Expected: کتاب, User: book |
| 0.0 - 0.2 | Incorrect | Expected: پیار, User: نفرت |
Vocabulary Samples
Urdu
| Word | Translation | Pronunciation |
|---|---|---|
| شکریہ | Thank you | shukriya |
| خوش آمدید | Welcome | khush aamdeed |
| براہ کرم | Please | barah-e-karam |
Pakistani Punjabi (Shahmukhi)
| Word | Translation | Pronunciation |
|---|---|---|
| جی آیاں نوں | Welcome | ji aayan nu |
| ودھیا | Very good | wadiya |
| سوہنا | Beautiful | sohna |
Important Note on Punjabi
⚠️ This model uses Pakistani Muslim Punjabi written in Shahmukhi script (Arabic-based). It does NOT include Sikh/Hindi Punjabi (Gurmukhi script) words.
All Punjabi vocabulary is authentic Pakistani Punjabi as spoken in Punjab, Pakistan.
Training Configuration
- Epochs: 2
- Batch Size: 32
- Learning Rate: 2e-05
- Max Length: 128
- Training Samples: 3386
- Validation Samples: 598
Use Cases
- Language Learning Apps: Score user responses in vocabulary exercises
- Quiz Systems: Validate answers in MCQ and fill-in-the-blank questions
- Grammar Checking: Evaluate sentence correctness
- Translation Apps: Verify translation accuracy
Citation
@misc{urdu-punjabi-v2-2024,
author = {RAFAY-484},
title = {Urdu-Punjabi Language Learning Model V2},
year = {2024},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/RAFAY-484/Urdu-Punjabi-V2}}
}
License
MIT License - Free for educational and commercial use.
- Downloads last month
- 16