DistilBERT Taglish Fake News Detector (v1)
This model is based on distilbert-base-multilingual-cased and fine‑tuned on:
- Fake News Filipino (FNF) dataset
- Cluster‑disjoint splits to prevent topic leakage
- Taglish paraphrase augmentation (LLM‑generated Filipino–English code-mixed variants)
- Evaluated on:
- Full FNF test set
- Topic-balanced subset
- Mixed Taglish adversarial noise
- Real Taglish social-media posts (50 samples)
🔥 Why this model?
Among all tested models (RoBERTa-Tagalog, mBERT, XLM‑R, DistilBERT), DistilBERT + Taglish Paraphrasing achieved the highest real-world Taglish performance, making it ideal for browser extensions and on-device inference.
Real Taglish Weighted F1 Scores:
| Model Variant | Real Taglish F1 |
|---|---|
| RoBERTa Baseline | ~0.23 |
| RoBERTa + Paraphrasing | ~0.31 |
| RoBERTa + Adv | ~0.36 |
| mBERT | ~0.32 |
| XLM-R | ~0.35 |
| DistilBERT Baseline | ~0.36 |
| DistilBERT + Paraphrase (THIS MODEL) | ~0.41 (BEST) |
This indicates stronger robustness to noisy, code-switched Taglish found on social media platforms.
🚀 Intended Use
- Chrome browser extension
- Real-time fake news detection
- Noisy Taglish text (Facebook, TikTok, IG comments, etc.)
- Code-switched Filipino–English environments
Not a substitute for professional fact-checking.
📦 Labels
- 0 = real
- 1 = fake
- Downloads last month
- 26