DistilBERT Taglish Fake News Detector (v1)

This model is based on distilbert-base-multilingual-cased and fine‑tuned on:

  • Fake News Filipino (FNF) dataset
  • Cluster‑disjoint splits to prevent topic leakage
  • Taglish paraphrase augmentation (LLM‑generated Filipino–English code-mixed variants)
  • Evaluated on:
    • Full FNF test set
    • Topic-balanced subset
    • Mixed Taglish adversarial noise
    • Real Taglish social-media posts (50 samples)

🔥 Why this model?

Among all tested models (RoBERTa-Tagalog, mBERT, XLM‑R, DistilBERT), DistilBERT + Taglish Paraphrasing achieved the highest real-world Taglish performance, making it ideal for browser extensions and on-device inference.

Real Taglish Weighted F1 Scores:

Model Variant Real Taglish F1
RoBERTa Baseline ~0.23
RoBERTa + Paraphrasing ~0.31
RoBERTa + Adv ~0.36
mBERT ~0.32
XLM-R ~0.35
DistilBERT Baseline ~0.36
DistilBERT + Paraphrase (THIS MODEL) ~0.41 (BEST)

This indicates stronger robustness to noisy, code-switched Taglish found on social media platforms.


🚀 Intended Use

  • Chrome browser extension
  • Real-time fake news detection
  • Noisy Taglish text (Facebook, TikTok, IG comments, etc.)
  • Code-switched Filipino–English environments

Not a substitute for professional fact-checking.

📦 Labels

  • 0 = real
  • 1 = fake
Downloads last month
26
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support