MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4 AdaMLLab/AraMix Viewer • Updated Jan 30 • 394M • 3.5k • 5 AdaMLLab/TurMix Viewer • Updated Jan 30 • 681M • 12.4k • 5 AdaMLLab/HinMix Viewer • Updated Jan 30 • 179M • 9.58k • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4
MixMinMatch Collection of datasets from MixMinMatch work. Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4 AdaMLLab/AraMix Viewer • Updated Jan 30 • 394M • 3.5k • 5 AdaMLLab/TurMix Viewer • Updated Jan 30 • 681M • 12.4k • 5 AdaMLLab/HinMix Viewer • Updated Jan 30 • 179M • 9.58k • 1
Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Paper • 2512.18834 • Published Dec 21, 2025 • 4