Diffutron: A Masked Diffusion Language Model for Turkish Language
| 🤗 Models |
📊 Pre-training Dataset |
📄 Paper |
Overview
Diffutron is a lightweight, non-autoregressive Masked Diffusion Language Model (MDLM) specifically optimized for the Turkish language. By utilizing a discrete diffusion process, Diffutron generates text through iterative refinement, allowing for bi-directional context awareness and high parameter efficiency.
Core Features
- Architecture: Discrete Masked Diffusion (MDLM) using a 307M parameter encoder backbone.
- Efficiency: Achieves competitive performance against 2B+ parameter autoregressive models on Turkish benchmarks.
- Adaptation: LoRA-based (r=256) continual pre-training on a 2M sequence Turkish corpus.
- Instruction Tuning: Progressive strategy using LlamaTurk and InstrucTurca datasets for enhanced command following.
Benchmarks
Diffutron achieves a significant reduction in perplexity and competitive scores across the CETVEL benchmark suite:
| Benchmark |
Diffutron-1st-Stage (0.3B) |
Diffutron-2nd-Stage (0.3B) |
TURNA (1.1B) |
Kumru (2B) |
Kanarya (2B) |
Llama-3.2 (3B) |
Trendyol (7B) |
Aya-101 (13B) |
| Belebele_TR |
22.22 |
27.00 |
22.56 |
29.00 |
28.11 |
55.78 |
36.22 |
22.89 |
| EXAMS_TR |
25.95 |
27.74 |
23.66 |
30.03 |
30.03 |
26.21 |
28.50 |
22.90 |
| IronyTR |
50.67 |
52.00 |
48.33 |
51.00 |
50.00 |
50.17 |
50.00 |
52.17 |
| News_Cat |
23.20 |
32.40 |
32.80 |
26.40 |
66.80 |
64.00 |
81.20 |
20.00 |
| MNLI_TR |
33.29 |
32.81 |
34.94 |
36.42 |
33.40 |
34.76 |
35.19 |
27.90 |
| STS_TR |
17.77 |
18.78 |
14.21 |
11.75 |
12.91 |
12.91 |
15.52 |
16.97 |
| XCOPA_TR |
53.80 |
52.00 |
55.80 |
54.00 |
64.20 |
54.60 |
61.00 |
59.60 |
| Average |
32.41 |
34.68 |
33.19 |
34.09 |
40.78 |
42.63 |
43.95 |
31.78 |
Citation
@misc{diffutron2026,
title={Diffutron: A Masked Diffusion Language Model for Turkish Language},
author={Şuayp Talha Kocabay and Talha Rüzgar Akkuş},
year={2026},
eprint={2603.20466},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.20466},
}