Sparse Auto-Encoders (SAEs) for Mechanistic Interpretability Collection A compilation of sparse auto-encoders trained on large language models. • 34 items • Updated Oct 10 • 4
Gaperon: A Peppered English-French Generative Language Model Suite Paper • 2510.25771 • Published Oct 29 • 15
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection Paper • 2411.08868 • Published Nov 13, 2024 • 13
Awesome Document AI Collection A collection of open-source document AI 📄 📝 📈 • 27 items • Updated Mar 11, 2024 • 80
Harvesting Textual and Structured Data from the HAL Publication Repository Paper • 2407.20595 • Published Jul 30, 2024 • 22
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus Paper • 2406.08707 • Published Jun 13, 2024 • 17