Urdu-Punjabi Language Learning Model V2

This is a fine-tuned XLM-RoBERTa model for:

  • Answer Scoring: Evaluating user responses in Urdu and Pakistani Punjabi
  • Grammar Checking: Validating sentence structure
  • Translation Validation: Checking translation accuracy

Model Details

  • Base Model: xlm-roberta-base
  • Languages: Urdu (Nastaliq), Pakistani Punjabi (Shahmukhi), English
  • Task: Regression (score 0.0 to 1.0)
  • Fine-tuned on: Custom vocabulary dataset (1000 words)

Dataset

The model was trained on a comprehensive custom dataset with:

  • 20 Chapters (10 Urdu + 10 Punjabi)
  • 50 words per chapter = 1000 vocabulary items
  • 100 Quiz MCQ Questions (5 per chapter)
  • 100 User Input Questions (5 per chapter)
  • 1000+ Grammar Examples (sentence-translation pairs)

Chapter Topics:

  1. Greetings & Polite Expressions (سلام و آداب)
  2. Family & Relationships (خاندان اور رشتے)
  3. Food & Dining (کھانا اور خوراک)
  4. Numbers & Counting (گنتی اور اعداد)
  5. Places & Locations (جگہیں اور مقامات)
  6. Shopping & Money (خریداری اور پیسے)
  7. Emotions & Feelings (جذبات اور احساسات)
  8. Weather & Nature (موسم اور فطرت)
  9. Body Parts & Health (جسم کے اعضاء اور صحت)
  10. Education & Learning (تعلیم اور سیکھنا)

Usage

Python

from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
import torch

# Load model
model_name = "RAFAY-484/Urdu-Punjabi-V2"
tokenizer = XLMRobertaTokenizer.from_pretrained(model_name)
model = XLMRobertaForSequenceClassification.from_pretrained(model_name)

# Score an answer
expected = "خوش"  # Happy in Urdu/Punjabi
user_input = "خوش"

inputs = tokenizer(expected, user_input, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    score = torch.sigmoid(outputs.logits).item()

print(f"Score: {score:.3f}")  # Output: Score: 0.95+

API Usage (Flutter/Mobile App)

Future<double> scoreAnswer(String expected, String userInput) async {
  final response = await http.post(
    Uri.parse('https://api-inference.huggingface.co/models/RAFAY-484/Urdu-Punjabi-V2'),
    headers: {
      'Authorization': 'Bearer YOUR_HF_TOKEN',
      'Content-Type': 'application/json',
    },
    body: jsonEncode({
      'inputs': '$expected [SEP] $userInput',
    }),
  );
  
  final result = jsonDecode(response.body);
  return result[0]['score'];
}

Scoring Guide

Score Range Meaning Example
0.9 - 1.0 Exact match Expected: خوش, User: خوش
0.7 - 0.9 Close match Expected: السلام علیکم, User: السلام
0.4 - 0.7 Partial match Expected: شکریہ, User: شکر
0.2 - 0.4 Related word Expected: کتاب, User: book
0.0 - 0.2 Incorrect Expected: پیار, User: نفرت

Vocabulary Samples

Urdu

Word Translation Pronunciation
شکریہ Thank you shukriya
خوش آمدید Welcome khush aamdeed
براہ کرم Please barah-e-karam

Pakistani Punjabi (Shahmukhi)

Word Translation Pronunciation
جی آیاں نوں Welcome ji aayan nu
ودھیا Very good wadiya
سوہنا Beautiful sohna

Important Note on Punjabi

⚠️ This model uses Pakistani Muslim Punjabi written in Shahmukhi script (Arabic-based). It does NOT include Sikh/Hindi Punjabi (Gurmukhi script) words.

All Punjabi vocabulary is authentic Pakistani Punjabi as spoken in Punjab, Pakistan.

Training Configuration

  • Epochs: 2
  • Batch Size: 32
  • Learning Rate: 2e-05
  • Max Length: 128
  • Training Samples: 3386
  • Validation Samples: 598

Use Cases

  1. Language Learning Apps: Score user responses in vocabulary exercises
  2. Quiz Systems: Validate answers in MCQ and fill-in-the-blank questions
  3. Grammar Checking: Evaluate sentence correctness
  4. Translation Apps: Verify translation accuracy

Citation

@misc{urdu-punjabi-v2-2024,
  author = {RAFAY-484},
  title = {Urdu-Punjabi Language Learning Model V2},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/RAFAY-484/Urdu-Punjabi-V2}}
}

License

MIT License - Free for educational and commercial use.

Downloads last month
16
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RAFAY-484/Urdu-Punjabi-V2

Quantizations
1 model