PAT-ER — Primitive-Augmented Transformer with Event-Role Stream
PAT-ER is a decoder-only causal language model whose hidden computation is shaped by two learned side-state streams — an event-role stream and a primitive stream — in addition to the normal token stream. This checkpoint is a research prototype warm-started from Qwen/Qwen3-0.6B with the PAT-ER side-state, then given a decoupled product-generation interface.
Research prototype. PAT-ER proposes and predicts; it does not prove, certify, or guarantee. Not safety-certified. Not a theorem prover.
Intended use
Research / reviewer evaluation of (a) side-state reasoning signals (primitive class, role-to-primitive bridge, support, IDK) and (b) schema-grounded interface generation (evidence-grounded answers, abduction-as-hypothesis, contradiction, valid JSON, and Hermes-style tool calls).
Two modes (same checkpoint)
A forward(..., interface_mode=bool) flag selects the mode:
- base mode (
interface_mode=False, default) — side-state / matrix behavior. Runs the original frozen Condition-D blocks; the auxiliary heads (primitive, support, role-to-primitive, IDK, …) and the LM are the Condition-D path. Use this to read side-state predictions. interface_mode=True— product-format generation. Runs mode-gated copies of the top-2 decoder blocks whose adapted attention produces tool calls and structured answers.interface_mode=Trueis required for product-format generation; base mode does not emit tool calls.
The interface is decoupled: base mode is byte-identical to the underlying Condition-D model, so product generation costs zero side-state.
Key measured results
Architecture contribution (8 seeds) — adding PAT-ER side-state registers to the same Qwen3-0.6B backbone (Condition B → C):
| Metric | Δ (B→C) | 95% CI |
|---|---|---|
| primitive macro-F1 | +0.209 | [+0.182, +0.237] |
| role-to-primitive macro-F1 | +0.090 | [+0.074, +0.110] |
Usable interface (3 seeds, 242 held-out prompts, interface_mode=True):
| Metric | Result |
|---|---|
| Hermes tool-call parse | 0.952 |
| tool arguments exact (no hallucinated args) | 0.981 |
| JSON valid / keys | 1.000 |
| IDK precision/recall/F1 | 1.000 |
| base-mode primitive / r2p / LM | unchanged vs Condition D |
Known limit: unseen tool-name exact accuracy is 0.886 — semantic substitution and multi-token truncation on a minority of unseen names. Robust schema-grounded function calling is supported; perfect tool-name copying is not.
Files
| File | Description |
|---|---|
model.safetensors |
weights (warm-started Qwen3-0.6B backbone + PAT-ER side-state + interface layers) |
config.json |
PATERConfig (incl. interface_adapt_layers: 2) |
pat_er/ |
minimal source package needed to load the custom architecture |
tokenizer.json, tokenizer_config.json, chat_template.jinja, pater_manifest.json |
Qwen3 tokenizer extended with the PAT-ER token spec |
example_usage.py |
load + run example (base mode and interface_mode) |
eval_prompts.jsonl |
synthetic held-out product-eval fixture (generated; no external dataset text) |
Usage
pip install torch safetensors transformers
python3 example_usage.py
import json, torch
from safetensors.torch import load_file
from transformers import AutoTokenizer
from pat_er import PATERConfig, PATERForCausalLM
cfg = json.load(open("config.json"))
config = PATERConfig(**{k: v for k, v in cfg.items() if k in PATERConfig.__dataclass_fields__})
model = PATERForCausalLM(config)
model.load_state_dict(load_file("model.safetensors"), strict=False) # lm_head is re-tied
model.eval()
tok = AutoTokenizer.from_pretrained(".")
# interface_mode=True -> tool call; interface_mode=False -> side-state (Condition D)
out = model(input_ids=tok("...", return_tensors="pt").input_ids, interface_mode=True)
Datasets
Side-state training uses upstream reasoning datasets (ProofWriter, FOLIO, and synthetic
primitive data) through local converters. Converted external datasets are not
redistributed here. The included eval_prompts.jsonl is a synthetic, generated fixture
(hand-written reasoning templates + synthetic tool schemas) and contains no external
dataset text.
Limitations & safety
- Research prototype; not safety-certified; not a theorem prover.
- Contradiction/deep reasoning is partial; primitive/support labels are predictions.
- Tool calls are proposals. A well-formed, name-grounded
<tool_call>is not approval to execute — validate the tool name and arguments against the real schema before any execution, and keep a human/verifier in the loop for irreversible actions. - 760M scale is not validated; this checkpoint is ~762M (Qwen3-0.6B backbone + PAT-ER).
Project
Source, training, and evaluation code: https://github.com/Pronto-Sage/primitive-augmented-transformer
Base model: Qwen/Qwen3-0.6B (Apache-2.0). This derivative is released under Apache-2.0.
- Downloads last month
- -