PRISM-Memory

PRISM-Memory is a LoRA adapter that trains Qwen/Qwen2.5-7B-Instruct to write proposition-level memory from dialogue. It is a memory-writing component, not a general chat model.

Released model

Model name: PRISM-Memory 7B Adapter
Base model: Qwen/Qwen2.5-7B-Instruct
Adapter type: LoRA

What this release shows

A 7B open model can replace GPT-4.1 for the extraction step in this memory pipeline.
On the confirmed release surface, PRISM-Memory scores 0.4768 on LongMemEval and 0.4981 on LoCoMo.
The GPT-4.1-based PropMem reference scores 0.4650 on LongMemEval and 0.5360 on LoCoMo.

This comparison holds the QA layer constant. It compares extractor against extractor, not a full end-to-end GPT-4.1 system.

Why this is useful

It keeps hard limits and preferences available for later workflow generation.
It keeps current state separate from future plans.
It supports dated recall and clean refusal on unsupported questions.

See docs/release/memory-scenarios.md for compact end-to-end examples.

Load the adapter

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "AsadIsmail/prism-memory"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_id,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)

This repo contains adapter weights only. You still need the base model.

Training data

PRISM-Memory was trained on synthetic multi-session memory conversations with GPT-4.1-derived memory-writing labels. The public release does not use real user chat logs.

Item	Count	Notes
synthetic training conversations	`2,329`	multi-session conversations with inserts, updates, and deletes
synthetic held-out conversations	`584`	evaluation split used for held-out examples
supervised extraction examples	`100,427`	memory-writing labels derived from the synthetic corpus
released training subset	`20,000`	supervised examples used for the public adapter

Example training item

Synthetic scenario

Domain: cloud infrastructure performance optimization
Persona: senior cloud systems engineer at a fintech startup

Synthetic user turn

Here’s the initial architecture outline: deploy microservices on AWS Fargate, use PostgreSQL 13 as the primary database, plan Kubernetes orchestration, use Redis for caching, and keep API latency under 50ms.

Target memory records

Deploy microservices on AWS Fargate
Orchestrate containers on a Kubernetes cluster (planned)
Primary database: PostgreSQL 13
Use Redis as an in-memory caching layer
Latency target: API responses under 50ms

The release makes the dataset design, counts, and example records public. It does not bundle the full raw corpus files.

Confirmed results

Benchmark	PRISM-Memory	GPT-4.1-based PropMem reference
LongMemEval	`0.4768`	`0.4650`
LoCoMo	`0.4981`	`0.5360`

The reproduced evaluation hit the cached QA surface exactly: 460 hits, 0 misses.

Extraction examples

Infrastructure bottlenecks stay structured

Session date: 2025-01-04 15:34:00
Overlap score: 0.909
Note: Near-exact match on two operational facts from a single held-out turn.

Turn

yeah, no real caching beyond basic Docker layer caching. Jenkins nodes have limited capacity, and we sometimes hit queue delays during peak commits.

GPT-4.1 reference

No caching beyond basic Docker layer caching
Jenkins nodes have limited capacity and experience queue delays during peak commits

PRISM-Memory

No Docker caching beyond basic layer caching
Jenkins nodes have limited capacity; peak commits cause queue delays

Numeric constraints and preferences survive extraction

Session date: 2025-03-01 15:07:00
Overlap score: 0.800
Note: The trained model keeps both the hard concurrency cap and the desired notification style.

Turn

yeah, I think starting with incremental scans and parallel matrix jobs makes sense. We have 20 concurrent jobs max on GitHub Actions currently. Also want to keep Slack notifications from Snyk consistent with other pipeline alerts—aggregated and concise. Can you help draft the workflow?

GPT-4.1 reference

GitHub Actions concurrency limit: 20 concurrent jobs
Wants Snyk Slack notifications aggregated and concise, consistent with other pipeline alerts

PRISM-Memory

GitHub Actions concurrency limit: 20 concurrent jobs
Snyk Slack notifications should be aggregated and concise

More held-out examples live in docs/release/extraction-examples.md.

Bundled docs and artifacts

Demo

The companion Space is live at https://huggingface.co/spaces/AsadIsmail/prism-memory.

Limitations

This is a memory-writing component, not a general chat model.
It is a LoRA adapter, not a standalone full checkpoint.
The evaluation pipeline still uses a separate QA model to score retrieved memory.
Temporal and inferential categories still trail stronger larger-model baselines.

Downloads last month: 3

Model tree for AsadIsmail/prism-memory

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2256)

this model

AsadIsmail
/

prism-memory