Synthetic data derived from finepdfs
MultiSynt
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
MultiSynt is a collaborative initiative between OpenEuroLLM and EuroLLM focused on developing high-quality multilingual synthetic datasets for language model pretraining. By combining expertise from both organizations, MultiSynt aims to advance the creation of multilingual synthetic training data that supports diverse European languages to enable more inclusive AI development across languages.
models
33
MultiSynt/nemotron-cc-italian-tower9b
Updated
•
9
MultiSynt/2B-1TT-tower9b-mixture
Updated
•
17
MultiSynt/2B-1TT-native-mixture
Updated
•
24
MultiSynt/nemotron-cc-portuguese-opus
Updated
•
19
MultiSynt/nemotron-cc-basque-opus
Updated
•
47
MultiSynt/nemotron-cc-polish-opus
Updated
•
13
MultiSynt/nemotron-cc-polish-tower9b
Updated
•
22
MultiSynt/nemotron-cc-french-opus
Updated
•
15
MultiSynt/nemotron-cc-french-tower9b
Updated
•
23
MultiSynt/nemotron-cc-norwegian-tower9b
Updated
•
49
datasets
6
MultiSynt/MT-Nemotron-CC
Viewer
•
Updated
•
15.6B
•
414
•
1
MultiSynt/MT-HPLT2c
Viewer
•
Updated
•
1.76B
•
368
MultiSynt/MT-Reasoning
Viewer
•
Updated
•
82M
•
102
MultiSynt/MT-Reasoning-Prompts
Viewer
•
Updated
•
399M
•
258
MultiSynt/nemotron-cc-spanish-opus-qe
Viewer
•
Updated
•
3.29B
•
67
MultiSynt/finepdfs-summaries
Viewer
•
Updated
•
1.57B
•
301
•
1