MnemoCore Pattern Learner β Specification Draft
Version: 0.1-draft
Date: 2026-02-20
Status: Draft for Review
Author: Omega (GLM-5) for Robin Granberg
Executive Summary
Pattern Learner Γ€r en MnemoCore-modul som lΓ€r sig frΓ₯n anvΓ€ndarinteraktioner utan att lagra persondata. Den extraherar statistiska mΓΆnster, topic clustering och kvalitetsmetrics som kan anvΓ€ndas fΓΆr att fΓΆrbΓ€ttra chatbot-performance ΓΆver tid.
Key principle: Learn patterns, forget people.
Problem Statement
Healthcare Chatbot Challenges
| Utmaning | Konsekvens |
|---|---|
| GDPR/HIPAA compliance | Kan inte lagra konversationer |
| Multitenancy | Data fΓ₯r inte lΓ€cka mellan kliniker |
| Quality improvement | BehΓΆver veta vad som fungerar |
| Knowledge gaps | BehΓΆver identifiera vad som saknas i docs |
Current Solutions (Limitations)
- Stateless RAG: Ingen inlΓ€rning alls
- Full memory: GDPR-risk, sekretessproblem
- Manual analytics: TidskrΓ€vande, inte real-time
Solution: Pattern Learner
Core Concept
User Query βββΊ Anonymize βββΊ Extract Pattern βββΊ Aggregate
β
βββ PII removed before storage
What IS stored:
- Topic clusters (anonymized)
- Query frequency distributions
- Response quality aggregates
- Knowledge gap indicators
What is NOT stored:
- User identities
- Clinic associations
- Patient data
- Raw conversations
Architecture
High-Level Design
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Pattern Learner Module β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Anonymizer βββββΊβTopic ExtractorβββββΊβ Aggregator β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β β β
β β βΌ βΌ β
β β ββββββββββββββββ ββββββββββββββββ β
β β βTopic Embedderβ β Stats Store β β
β β β (MnemoCore) β β (Encrypted) β β
β β ββββββββββββββββ ββββββββββββββββ β
β β β β β
β βββββββββββββββββββββ΄βββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββ β
β β Insights APIβ β
β ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Components
1. Anonymizer
Purpose: Remove all PII before processing
Methods:
- Named Entity Recognition (NER) for person names
- Pattern matching for phone numbers, addresses
- Clinic/organization detection
- Session ID hashing
class Anonymizer:
"""Remove PII from queries before pattern extraction"""
def __init__(self):
self.ner_model = load_ner_model("sv") # Swedish
self.patterns = {
"phone": r"\+?\d{1,3}[\s-]?\d{2,4}[\s-]?\d{2,4}[\s-]?\d{2,4}",
"email": r"[\w\.-]+@[\w\.-]+\.\w+",
"personal_number": r"\d{6,8}[-\s]?\d{4}",
}
def anonymize(self, text: str) -> str:
"""Remove all PII from text"""
# 1. NER for names
entities = self.ner_model.extract(text)
for entity in entities:
if entity.type in ["PER", "ORG"]:
text = text.replace(entity.text, "[ANON]")
# 2. Pattern matching
for pattern_type, pattern in self.patterns.items():
text = re.sub(pattern, f"[{pattern_type.upper()}]", text)
# 3. Remove clinic names (configurable blacklist)
for clinic_name in self.clinic_blacklist:
text = text.replace(clinic_name, "[KLINIK]")
return text
2. Topic Extractor
Purpose: Extract semantic topics from anonymized queries
Methods:
- Keyword extraction (TF-IDF)
- Topic modeling (LDA, BERTopic)
- Embedding-based clustering
class TopicExtractor:
"""Extract topics from anonymized queries"""
def __init__(self, mnemocore_engine):
self.engine = mnemocore_engine
self.topic_threshold = 0.5
async def extract_topics(self, query: str) -> List[str]:
"""Extract topics from anonymized query"""
# 1. Get keywords
keywords = self._extract_keywords(query)
# 2. Find similar topics in MnemoCore
similar = await self.engine.query(query, top_k=5)
# 3. Cluster into topics
topics = []
for memory_id, similarity in similar:
if similarity > self.topic_threshold:
memory = await self.engine.get_memory(memory_id)
topics.extend(memory.metadata.get("topics", []))
# 4. Deduplicate
return list(set(topics + keywords))
def _extract_keywords(self, text: str) -> List[str]:
"""Extract keywords using TF-IDF"""
# Simple implementation
words = text.lower().split()
return [w for w in words if len(w) > 3 and w not in STOPWORDS_SV]
3. Aggregator
Purpose: Store statistical patterns without PII
Data structures:
@dataclass
class TopicStats:
"""Statistics for a topic"""
topic: str
count: int = 0
first_seen: datetime = None
last_seen: datetime = None
trend: float = 0.0 # Recent increase/decrease
@dataclass
class ResponseQuality:
"""Aggregated response quality (no individual ratings)"""
response_signature: str # Hash of response template
avg_rating: float = 0.5
sample_count: int = 0
last_updated: datetime = None
@dataclass
class KnowledgeGap:
"""Topics with no good answers"""
topic: str
query_count: int = 0
failure_rate: float = 1.0 # % of queries that got "I don't know"
suggested_action: str = "" # "add documentation", "improve answer"
Storage:
class PatternStore:
"""Store patterns (encrypted, no PII)"""
def __init__(self, encryption_key: bytes):
self.key = encryption_key
self.topics: Dict[str, TopicStats] = {}
self.qualities: Dict[str, ResponseQuality] = {}
self.gaps: Dict[str, KnowledgeGap] = {}
def record_topic(self, topic: str):
"""Record that a topic was queried"""
if topic not in self.topics:
self.topics[topic] = TopicStats(
topic=topic,
first_seen=datetime.utcnow()
)
stats = self.topics[topic]
stats.count += 1
stats.last_seen = datetime.utcnow()
def record_quality(self, response_sig: str, rating: int):
"""Record response quality (aggregated)"""
if response_sig not in self.qualities:
self.qualities[response_sig] = ResponseQuality(
response_signature=response_sig
)
q = self.qualities[response_sig]
# Exponential moving average
q.avg_rating = 0.9 * q.avg_rating + 0.1 * (rating / 5.0)
q.sample_count += 1
q.last_updated = datetime.utcnow()
def record_gap(self, topic: str, had_answer: bool):
"""Record knowledge gap"""
if topic not in self.gaps:
self.gaps[topic] = KnowledgeGap(topic=topic)
gap = self.gaps[topic]
gap.query_count += 1
if not had_answer:
gap.failure_rate = (gap.failure_rate * (gap.query_count - 1) + 1) / gap.query_count
else:
gap.failure_rate = (gap.failure_rate * (gap.query_count - 1)) / gap.query_count
4. Insights API
Purpose: Provide actionable insights to admins/developers
Endpoints:
# GET /insights/topics?top_k=10
{
"topics": [
{"topic": "implantat", "count": 1250, "trend": 0.15},
{"topic": "rotfyllning", "count": 980, "trend": -0.02},
{"topic": "priser", "count": 850, "trend": 0.30}
],
"period": "30d"
}
# GET /insights/gaps
{
"knowledge_gaps": [
{
"topic": "tandreglering vuxna",
"query_count": 145,
"failure_rate": 0.85,
"suggested_action": "add documentation"
},
{
"topic": "akut tandvΓ₯rd",
"query_count": 89,
"failure_rate": 0.72,
"suggested_action": "improve answer"
}
]
}
# GET /insights/quality
{
"top_responses": [
{"signature": "abc123", "avg_rating": 4.8, "sample_count": 520},
{"signature": "def456", "avg_rating": 4.5, "sample_count": 340}
],
"worst_responses": [
{"signature": "xyz789", "avg_rating": 2.1, "sample_count": 45}
]
}
MnemoCore Integration
Usage Pattern
from mnemocore import HAIMEngine
from mnemocore.pattern_learner import PatternLearner
# Initialize MnemoCore (stores topic embeddings)
engine = HAIMEngine(dimension=16384)
await engine.initialize()
# Initialize Pattern Learner
learner = PatternLearner(
engine=engine,
encryption_key=get_encryption_key(),
anonymizer=Anonymizer()
)
# Process a query (automatic learning)
async def handle_query(user_query: str, tenant_id: str):
# 1. Anonymize
anon_query = learner.anonymize(user_query)
# 2. Extract patterns (no PII)
topics = await learner.extract_topics(anon_query)
# 3. Record topic usage
for topic in topics:
learner.record_topic(topic)
# 4. Get answer from RAG
answer = await rag_lookup(anon_query)
# 5. Record if we had an answer
learner.record_gap(
topic=topics[0] if topics else "unknown",
had_answer=(answer is not None)
)
return answer
# Get insights (admin only)
async def get_dashboard():
top_topics = learner.get_top_topics(10)
gaps = learner.get_knowledge_gaps()
quality = learner.get_response_quality()
return {
"popular_topics": top_topics,
"needs_documentation": gaps,
"response_performance": quality
}
GDPR Compliance
Data Minimization
| Data Type | Stored? | Justification |
|---|---|---|
| Raw queries | β | PII risk |
| User IDs | β | Not needed |
| Session IDs | β | Not needed |
| Clinic IDs | β | Not needed |
| Topic labels | β | Anonymized |
| Topic counts | β | Statistical |
| Quality scores | β | Aggregated |
| Gap indicators | β | Anonymized |
Right to Erasure (GDPR Art 17)
Since no PII is stored, right to erasure is automatically satisfied.
Data Retention
# Configurable retention
retention_policy = {
"topic_stats": "365d", # Keep for 1 year
"quality_scores": "90d", # Keep for 3 months
"gap_indicators": "30d", # Refresh monthly
}
# Automatic cleanup
async def cleanup_old_patterns():
cutoff = datetime.utcnow() - timedelta(days=retention_policy["topic_stats"])
for topic, stats in learner.topics.items():
if stats.last_seen < cutoff:
del learner.topics[topic]
Security Considerations
Encryption
- All pattern data encrypted at rest (AES-256)
- Encryption keys managed via HSM or Azure Key Vault
- Per-tenant encryption optional (for multi-tenant isolation)
Access Control
# Insights API requires admin role
@app.get("/insights/topics")
@require_role("admin")
async def get_topics():
return learner.get_top_topics(10)
Audit Logging
# Log all pattern access (not the patterns themselves)
async def log_access(user_id: str, endpoint: str, timestamp: datetime):
await audit_log.store({
"user_id": user_id,
"endpoint": endpoint,
"timestamp": timestamp.isoformat(),
# No pattern data logged
})
Implementation Roadmap
Phase 1: MVP (2 weeks)
- Anonymizer with Swedish NER
- Basic topic extraction (keywords)
- Topic counter (no MnemoCore yet)
- Simple insights API
Phase 2: MnemoCore Integration (2 weeks)
- Topic embedding storage in MnemoCore
- Semantic topic clustering
- Gap detection using similarity search
Phase 3: Quality Metrics (2 weeks)
- Response quality tracking
- Feedback integration
- Quality dashboard
Phase 4: Production Hardening (2 weeks)
- Encryption at rest
- Access control
- Audit logging
- Performance optimization
Business Value
For Healthcare Organizations
| Value | Metric |
|---|---|
| Documentation gaps | Know what to add to knowledge base |
| Popular topics | Prioritize documentation efforts |
| Response quality | Improve user satisfaction |
| Trend analysis | Identify emerging needs |
For Opus Dental (Competitive Advantage)
| Advantage | Value |
|---|---|
| Continuous improvement | Chatbot gets smarter without storing PII |
| Customer insights | Know what clinics need |
| Compliance by design | GDPR-safe from day 1 |
| Unique selling point | "Learning chatbot" vs competitors |
Technical Requirements
Dependencies
mnemocore>=4.5.0
spacy[sv]>=3.7.0 # Swedish NER
numpy>=1.24.0
cryptography>=41.0.0 # Encryption
Infrastructure
- MnemoCore instance (can be shared or per-tenant)
- Encrypted storage (Azure SQL, PostgreSQL with TDE)
- Optional: Azure Key Vault for key management
Performance
- Topic extraction: <50ms per query
- Insights API: <200ms
- Storage: ~1KB per unique topic (highly efficient)
Open Questions
Topic granularity: How specific should topics be? "Implantat" vs "Implantat pris" vs "Implantat komplikationer"
Trend detection: What time window for trend analysis? 7d? 30d?
Multi-language: Support for Finnish/Norwegian in addition to Swedish?
Tenant isolation: Should patterns be shared across tenants (anonymized) or kept separate?
Feedback mechanism: How to collect ratings? Thumbs up/down? 1-5 stars?
Conclusion
Pattern Learner enables continuous improvement of healthcare chatbots without GDPR risk. It learns what users ask about, which answers work, and where documentation is missing β all without storing any personal data.
Key innovation: Transform "memory" into "patterns" β compliance-safe learning.
Next Steps
- Review this spec
- Decide on open questions
- Prioritize MVP features
- Start implementation
Draft by Omega (GLM-5) for Robin Granberg
2026-02-20