PRISM-Memory

PRISM-Memory is a LoRA adapter that trains Qwen/Qwen2.5-7B-Instruct to write proposition-level memory from dialogue. It is a memory-writing component, not a general chat model.

Released model

  • Model name: PRISM-Memory 7B Adapter
  • Base model: Qwen/Qwen2.5-7B-Instruct
  • Adapter type: LoRA

What this release shows

  • A 7B open model can replace GPT-4.1 for the extraction step in this memory pipeline.
  • On the confirmed release surface, PRISM-Memory scores 0.4768 on LongMemEval and 0.4981 on LoCoMo.
  • The GPT-4.1-based PropMem reference scores 0.4650 on LongMemEval and 0.5360 on LoCoMo.

This comparison holds the QA layer constant. It compares extractor against extractor, not a full end-to-end GPT-4.1 system.

Why this is useful

  • It keeps hard limits and preferences available for later workflow generation.
  • It keeps current state separate from future plans.
  • It supports dated recall and clean refusal on unsupported questions.

See docs/release/memory-scenarios.md for compact end-to-end examples.

Load the adapter

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "AsadIsmail/prism-memory"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_id,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)

This repo contains adapter weights only. You still need the base model.

Training data

PRISM-Memory was trained on synthetic multi-session memory conversations with GPT-4.1-derived memory-writing labels. The public release does not use real user chat logs.

Item Count Notes
synthetic training conversations 2,329 multi-session conversations with inserts, updates, and deletes
synthetic held-out conversations 584 evaluation split used for held-out examples
supervised extraction examples 100,427 memory-writing labels derived from the synthetic corpus
released training subset 20,000 supervised examples used for the public adapter

Example training item

Synthetic scenario

  • Domain: cloud infrastructure performance optimization
  • Persona: senior cloud systems engineer at a fintech startup

Synthetic user turn

Here’s the initial architecture outline: deploy microservices on AWS Fargate, use PostgreSQL 13 as the primary database, plan Kubernetes orchestration, use Redis for caching, and keep API latency under 50ms.

Target memory records

  • Deploy microservices on AWS Fargate
  • Orchestrate containers on a Kubernetes cluster (planned)
  • Primary database: PostgreSQL 13
  • Use Redis as an in-memory caching layer
  • Latency target: API responses under 50ms

The release makes the dataset design, counts, and example records public. It does not bundle the full raw corpus files.

Confirmed results

Benchmark PRISM-Memory GPT-4.1-based PropMem reference
LongMemEval 0.4768 0.4650
LoCoMo 0.4981 0.5360

The reproduced evaluation hit the cached QA surface exactly: 460 hits, 0 misses.

Extraction examples

Infrastructure bottlenecks stay structured

  • Session date: 2025-01-04 15:34:00
  • Overlap score: 0.909
  • Note: Near-exact match on two operational facts from a single held-out turn.

Turn

yeah, no real caching beyond basic Docker layer caching. Jenkins nodes have limited capacity, and we sometimes hit queue delays during peak commits.

GPT-4.1 reference

  • No caching beyond basic Docker layer caching
  • Jenkins nodes have limited capacity and experience queue delays during peak commits

PRISM-Memory

  • No Docker caching beyond basic layer caching
  • Jenkins nodes have limited capacity; peak commits cause queue delays

Numeric constraints and preferences survive extraction

  • Session date: 2025-03-01 15:07:00
  • Overlap score: 0.800
  • Note: The trained model keeps both the hard concurrency cap and the desired notification style.

Turn

yeah, I think starting with incremental scans and parallel matrix jobs makes sense. We have 20 concurrent jobs max on GitHub Actions currently. Also want to keep Slack notifications from Snyk consistent with other pipeline alerts—aggregated and concise. Can you help draft the workflow?

GPT-4.1 reference

  • GitHub Actions concurrency limit: 20 concurrent jobs
  • Wants Snyk Slack notifications aggregated and concise, consistent with other pipeline alerts

PRISM-Memory

  • GitHub Actions concurrency limit: 20 concurrent jobs
  • Snyk Slack notifications should be aggregated and concise

More held-out examples live in docs/release/extraction-examples.md.

Bundled docs and artifacts

Demo

The companion Space is live at https://huggingface.co/spaces/AsadIsmail/prism-memory.

Limitations

  • This is a memory-writing component, not a general chat model.
  • It is a LoRA adapter, not a standalone full checkpoint.
  • The evaluation pipeline still uses a separate QA model to score retrieved memory.
  • Temporal and inferential categories still trail stronger larger-model baselines.
Downloads last month
85
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AsadIsmail/prism-memory

Base model

Qwen/Qwen2.5-7B
Adapter
(1776)
this model

Space using AsadIsmail/prism-memory 1