AI & ML interests

Natural Language Processing, Token Classification, Sentence Segmentation, Historical Texts, Medieval Texts, Digital Humanities, Computational Humanities, Multilingual NLP, Text Alignment, Historical Language Processing

Recent Activity

carolisteiaΒ  updated a Space 6 days ago
ProMeText/README
carolisteiaΒ  updated a Space 8 days ago
ProMeText/aquilign-explorer
View all activity

Organization Card

ProMeTEXT

ProMeTEXT β€” the Centre for PROcessing MEdieval TEXTs β€” develops datasets, models and tools for the computational study of medieval and historical texts.

Our work focuses on phrase-level segmentation, multilingual alignment, and the processing of medieval textual traditions across Romance languages, Latin, and Middle English.

Resources

  • Aquilign β€” a multilingual aligner for historical and philological corpora.
  • Aquilign Multilingual Segmenter β€” a Hugging Face model for phrase-level segmentation of historical texts.
  • Aquilign Explorer β€” a demo app for demonstrating multilingual alignment workflows.
  • Multilingual Segmentation Dataset β€” gold-standard segmentation data for medieval prose.
  • Parallel Alignment Corpora β€” multilingual aligned corpora used for fine-tuning LaBSE and evaluating multilingual alignment across historical textual traditions.

Links

datasets 0

None public yet