PRISM-Memory
PRISM-Memory is a LoRA adapter that trains Qwen/Qwen2.5-7B-Instruct to write
proposition-level memory from dialogue. It is a memory-writing component, not a
general chat model.
Released model
- Model name:
PRISM-Memory 7B Adapter - Base model:
Qwen/Qwen2.5-7B-Instruct - Adapter type:
LoRA
What this release shows
- A 7B open model can replace GPT-4.1 for the extraction step in this memory pipeline.
- On the confirmed release surface, PRISM-Memory scores
0.4768on LongMemEval and0.4981on LoCoMo. - The GPT-4.1-based PropMem reference scores
0.4650on LongMemEval and0.5360on LoCoMo.
This comparison holds the QA layer constant. It compares extractor against extractor, not a full end-to-end GPT-4.1 system.
Why this is useful
- It keeps hard limits and preferences available for later workflow generation.
- It keeps current state separate from future plans.
- It supports dated recall and clean refusal on unsupported questions.
See docs/release/memory-scenarios.md for compact end-to-end examples.
Load the adapter
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "AsadIsmail/prism-memory"
tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_id,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
This repo contains adapter weights only. You still need the base model.
Training data
PRISM-Memory was trained on synthetic multi-session memory conversations with GPT-4.1-derived memory-writing labels. The public release does not use real user chat logs.
| Item | Count | Notes |
|---|---|---|
| synthetic training conversations | 2,329 |
multi-session conversations with inserts, updates, and deletes |
| synthetic held-out conversations | 584 |
evaluation split used for held-out examples |
| supervised extraction examples | 100,427 |
memory-writing labels derived from the synthetic corpus |
| released training subset | 20,000 |
supervised examples used for the public adapter |
Example training item
Synthetic scenario
- Domain: cloud infrastructure performance optimization
- Persona: senior cloud systems engineer at a fintech startup
Synthetic user turn
Here’s the initial architecture outline: deploy microservices on AWS Fargate, use PostgreSQL 13 as the primary database, plan Kubernetes orchestration, use Redis for caching, and keep API latency under 50ms.
Target memory records
- Deploy microservices on AWS Fargate
- Orchestrate containers on a Kubernetes cluster (planned)
- Primary database: PostgreSQL 13
- Use Redis as an in-memory caching layer
- Latency target: API responses under 50ms
The release makes the dataset design, counts, and example records public. It does not bundle the full raw corpus files.
Confirmed results
| Benchmark | PRISM-Memory | GPT-4.1-based PropMem reference |
|---|---|---|
| LongMemEval | 0.4768 |
0.4650 |
| LoCoMo | 0.4981 |
0.5360 |
The reproduced evaluation hit the cached QA surface exactly: 460 hits,
0 misses.
Extraction examples
Infrastructure bottlenecks stay structured
- Session date:
2025-01-04 15:34:00 - Overlap score:
0.909 - Note: Near-exact match on two operational facts from a single held-out turn.
Turn
yeah, no real caching beyond basic Docker layer caching. Jenkins nodes have limited capacity, and we sometimes hit queue delays during peak commits.
GPT-4.1 reference
- No caching beyond basic Docker layer caching
- Jenkins nodes have limited capacity and experience queue delays during peak commits
PRISM-Memory
- No Docker caching beyond basic layer caching
- Jenkins nodes have limited capacity; peak commits cause queue delays
Numeric constraints and preferences survive extraction
- Session date:
2025-03-01 15:07:00 - Overlap score:
0.800 - Note: The trained model keeps both the hard concurrency cap and the desired notification style.
Turn
yeah, I think starting with incremental scans and parallel matrix jobs makes sense. We have 20 concurrent jobs max on GitHub Actions currently. Also want to keep Slack notifications from Snyk consistent with other pipeline alerts—aggregated and concise. Can you help draft the workflow?
GPT-4.1 reference
- GitHub Actions concurrency limit: 20 concurrent jobs
- Wants Snyk Slack notifications aggregated and concise, consistent with other pipeline alerts
PRISM-Memory
- GitHub Actions concurrency limit: 20 concurrent jobs
- Snyk Slack notifications should be aggregated and concise
More held-out examples live in docs/release/extraction-examples.md.
Bundled docs and artifacts
- docs/release/datasets.md
- docs/release/extraction-examples.md
- docs/release/extraction-skill.md
- docs/release/memory-scenarios.md
- docs/release/release-results.md
- docs/release/technical-blog.md
- results/release_summary.json
- results/extraction_examples.json
- results/try_it_sessions.json
Demo
The companion Space is live at
https://huggingface.co/spaces/AsadIsmail/prism-memory.
Limitations
- This is a memory-writing component, not a general chat model.
- It is a LoRA adapter, not a standalone full checkpoint.
- The evaluation pipeline still uses a separate QA model to score retrieved memory.
- Temporal and inferential categories still trail stronger larger-model baselines.
- Downloads last month
- 85