--- language: - tl - en tags: - fake-news - misinformation - taglish - code-switching - filipino - text-classification pipeline_tag: text-classification license: mit --- # DistilBERT Taglish Fake News Detector (v1) This model is based on **distilbert-base-multilingual-cased** and fine‑tuned on: - **Fake News Filipino (FNF)** dataset - **Cluster‑disjoint splits** to prevent topic leakage - **Taglish paraphrase augmentation** (LLM‑generated Filipino–English code-mixed variants) - Evaluated on: - Full FNF test set - Topic-balanced subset - Mixed Taglish adversarial noise - **Real Taglish social-media posts (50 samples)** ## 🔥 Why this model? Among all tested models (RoBERTa-Tagalog, mBERT, XLM‑R, DistilBERT), **DistilBERT + Taglish Paraphrasing** achieved the **highest real-world Taglish performance**, making it ideal for browser extensions and on-device inference. ### **Real Taglish Weighted F1 Scores:** | Model Variant | Real Taglish F1 | |--------------|------------------| | RoBERTa Baseline | ~0.23 | | RoBERTa + Paraphrasing | ~0.31 | | RoBERTa + Adv | ~0.36 | | mBERT | ~0.32 | | XLM-R | ~0.35 | | DistilBERT Baseline | ~0.36 | | **DistilBERT + Paraphrase (THIS MODEL)** | **~0.41 (BEST)** | This indicates stronger robustness to **noisy, code-switched Taglish** found on social media platforms. --- ## 🚀 Intended Use - Chrome browser extension - Real-time fake news detection - Noisy Taglish text (Facebook, TikTok, IG comments, etc.) - Code-switched Filipino–English environments Not a substitute for professional fact-checking. ## 📦 Labels - **0 = real** - **1 = fake**