You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

PAT-ER — Primitive-Augmented Transformer with Event-Role Stream

PAT-ER is a decoder-only causal language model whose hidden computation is shaped by two learned side-state streams — an event-role stream and a primitive stream — in addition to the normal token stream. This checkpoint is a research prototype warm-started from Qwen/Qwen3-0.6B with the PAT-ER side-state, then given a decoupled product-generation interface.

Research prototype. PAT-ER proposes and predicts; it does not prove, certify, or guarantee. Not safety-certified. Not a theorem prover.

Intended use

Research / reviewer evaluation of (a) side-state reasoning signals (primitive class, role-to-primitive bridge, support, IDK) and (b) schema-grounded interface generation (evidence-grounded answers, abduction-as-hypothesis, contradiction, valid JSON, and Hermes-style tool calls).

Two modes (same checkpoint)

A forward(..., interface_mode=bool) flag selects the mode:

base mode (interface_mode=False, default) — side-state / matrix behavior. Runs the original frozen Condition-D blocks; the auxiliary heads (primitive, support, role-to-primitive, IDK, …) and the LM are the Condition-D path. Use this to read side-state predictions.
interface_mode=True — product-format generation. Runs mode-gated copies of the top-2 decoder blocks whose adapted attention produces tool calls and structured answers. interface_mode=True is required for product-format generation; base mode does not emit tool calls.

The interface is decoupled: base mode is byte-identical to the underlying Condition-D model, so product generation costs zero side-state.

Key measured results

Architecture contribution (8 seeds) — adding PAT-ER side-state registers to the same Qwen3-0.6B backbone (Condition B → C):

Metric	Δ (B→C)	95% CI
primitive macro-F1	+0.209	[+0.182, +0.237]
role-to-primitive macro-F1	+0.090	[+0.074, +0.110]

Usable interface (3 seeds, 242 held-out prompts, interface_mode=True):

Metric	Result
Hermes tool-call parse	0.952
tool arguments exact (no hallucinated args)	0.981
JSON valid / keys	1.000
IDK precision/recall/F1	1.000
base-mode primitive / r2p / LM	unchanged vs Condition D

Known limit: unseen tool-name exact accuracy is 0.886 — semantic substitution and multi-token truncation on a minority of unseen names. Robust schema-grounded function calling is supported; perfect tool-name copying is not.

Files

File	Description
`model.safetensors`	weights (warm-started Qwen3-0.6B backbone + PAT-ER side-state + interface layers)
`config.json`	`PATERConfig` (incl. `interface_adapt_layers: 2`)
`pat_er/`	minimal source package needed to load the custom architecture
`tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`, `pater_manifest.json`	Qwen3 tokenizer extended with the PAT-ER token spec
`example_usage.py`	load + run example (base mode and `interface_mode`)
`eval_prompts.jsonl`	synthetic held-out product-eval fixture (generated; no external dataset text)

Usage

pip install torch safetensors transformers
python3 example_usage.py

import json, torch
from safetensors.torch import load_file
from transformers import AutoTokenizer
from pat_er import PATERConfig, PATERForCausalLM

cfg = json.load(open("config.json"))
config = PATERConfig(**{k: v for k, v in cfg.items() if k in PATERConfig.__dataclass_fields__})
model = PATERForCausalLM(config)
model.load_state_dict(load_file("model.safetensors"), strict=False)  # lm_head is re-tied
model.eval()
tok = AutoTokenizer.from_pretrained(".")

# interface_mode=True -> tool call;  interface_mode=False -> side-state (Condition D)
out = model(input_ids=tok("...", return_tensors="pt").input_ids, interface_mode=True)

Datasets

Side-state training uses upstream reasoning datasets (ProofWriter, FOLIO, and synthetic primitive data) through local converters. Converted external datasets are not redistributed here. The included eval_prompts.jsonl is a synthetic, generated fixture (hand-written reasoning templates + synthetic tool schemas) and contains no external dataset text.

Limitations & safety

Research prototype; not safety-certified; not a theorem prover.
Contradiction/deep reasoning is partial; primitive/support labels are predictions.
Tool calls are proposals. A well-formed, name-grounded <tool_call> is not approval to execute — validate the tool name and arguments against the real schema before any execution, and keep a human/verifier in the loop for irreversible actions.
760M scale is not validated; this checkpoint is ~762M (Qwen3-0.6B backbone + PAT-ER).

Project

Source, training, and evaluation code: https://github.com/Pronto-Sage/primitive-augmented-transformer

Base model: Qwen/Qwen3-0.6B (Apache-2.0). This derivative is released under Apache-2.0.

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for nur-dev/primitive-augmented-transformer

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(960)

this model