Today I am releasing 105 open-source models for Personally Identifiable Information (PII) detection in French, German, and Italian.
All Apache 2.0 licensed. Free for commercial use. No restrictions.
Performance:
- French: 97.97% F1 (top model) - German: 97.61% F1 (top model) - Italian: 97.28% F1 (top model)
All top-10 models per language exceed 96% F1
Coverage:
55+ PII entity types per language Native ID formats: NSS (French), Sozialversicherungsnummer (German), Codice Fiscale (Italian) Language-specific address, phone, and name patterns
European healthcare operates in European languages. Clinical notes, patient records, and medical documents are generated in French, German, Italian, and other languages.
Effective de-identification requires:
- Native language understanding — not translation - Local ID format recognition — each country has unique patterns - Cultural context awareness — names, addresses, and formats vary - These models deliver production-ready accuracy without requiring data to leave your infrastructure or language.
HIPAA & GDPR Compliance Built for US and European privacy regulations:
- On-premise deployment: Process data locally with zero external dependencies - Data sovereignty: No API calls, no cloud services, no cross-border transfers - Air-gapped capable: Deploy in fully isolated environments if required - Regulatory-grade accuracy: Supporting Expert Determination standards - HIPAA and GDPR compliance across languages, without compliance gaps.
Use Cases - Hospital EHR systems: Automated patient record de-identification - Clinical research: Multilingual dataset preparation for studies - Insurance companies: Claims processing across