Reflex-Coder7B-RISCV
A frozen Qwen2.5-Coder-7B-Instruct wired to a RISC-V CPU through Flamingo-style cross-attention. Emits one 32-bit RV32I instruction per cycle, conditioned on live machine state. No text tokens generated at inference.
This repo contains the adapter weights only (~4.2 GB fp32). The frozen backbone is pulled from Qwen/Qwen2.5-Coder-7B-Instruct at runtime. Total inference memory: ~18 GB bf16 backbone + 4.2 GB fp32 adapters + activations.
What it does
Given a natural-language prompt ("say hi", "multiply 7 and 8", "compute 5 factorial"), Reflex drives a Unicorn-backed RV32I emulator instruction by instruction. Each cycle:
- Read live CPU state (32 registers, PC, memory windows around PC and SP).
- Encode as 65 K/V tokens.
- Run the frozen backbone forward over the prompt, cross-attn adapters fuse state K/V into hidden states at depths 4, 8, 12, 16, 20, 24.
- Last-token pool β MLP β 32 bit sigmoid heads β one 32-bit RV32I instruction word.
- Write the word at PC in Unicorn, step one cycle, loop.
Base model
Qwen/Qwen2.5-Coder-7B-Instruct β frozen, bf16, untouched.
Training
- Corpus: 80,396 (prompt, program) pairs across 56 RV32I program families (arithmetic, loops, comparisons, memory ops, display writes). Every program verified by running it end-to-end through Unicorn before training.
- Flattened cycle pool: ~1.06M
(state, next_instruction)pairs. Balanced-subsampled to 173k across families per epoch. - Objective: per-bit binary cross-entropy over the 32 instruction bits, with
rs2bits (positions 20β24) weighted 5Γ to overcome the register/immediate polysemy ceiling. - Optimizer: standard AdamW, cosine LR schedule
1e-4 β 1e-6over 20k steps, batch 64. - Hardware: A100 80GB.
Results (18-task eval + 15-task sweep)
- 13 / 15 on a mixed zero-shot sweep (see README), including six tasks the model was never trained on: multiply-by-repeated-add, power, abs, min, popcount, say-arbitrary-3-char-strings.
- popcount(255) = 8 in 199 correct consecutive RISC-V instructions β an emergent algorithm derived at inference from the frozen backbone's prior on what "popcount" means.
- Full eval script:
uv run eval --checkpoint reflex_coder7b.pt.
Usage
import torch
from reflex.demo import load, run_grounded
model, tok, cfg = load("reflex_coder7b.pt", device="cuda")
cpu, emitted, halted, err = run_grounded(
model, tok, "multiply 7 and 8", device="cuda", max_cycles=200,
)
print(f"halted={halted} mem[0x5000]={cpu.mem_word(0x5000)}")
# halted=True mem[0x5000]=56
Or, interactively:
uv run demo --checkpoint reflex_coder7b.pt
Installation
git clone https://github.com/ilbertt/reflex
cd reflex
uv sync
# Download this checkpoint into the repo root:
huggingface-cli download ilbertt/reflex-coder7b-riscv reflex_coder7b.pt --local-dir .
The first time you run inference, HuggingFace will automatically fetch the frozen Qwen2.5-Coder-7B-Instruct backbone (~15 GB).
Limitations
- rs2 precision ceiling. Per-cycle rs2 accuracy ~0.99; long loops (>50 ops) can emit a single-bit-wrong instruction that crashes the program before it stores its result.
- No domain-knowledge transfer. Reflex only knows the program-shaped phrasings in its training corpus. Prompts like
"if x5 is fever, display SICK"fail β the adapters were never taught to route the backbone's semantic knowledge of "fever" through. - Display strings degrade past 3 characters.
say hi,say 42,say wowall land cleanly;say helloreturnshell. - Some common phrasings are unreliable.
add 100 and 200 and store the resultcan return100instead of300.subtract 10 from 25sometimes returns35(semantic confusion on the word "from"). - RV32I base ISA only β no M (multiply/divide), no Zbb (count/bitmanip), no F (float). The model synthesizes all "higher" operations from base instructions.
Files
reflex_coder7b.ptβ adapter weights, state encoder, head, and config dict (backbone_id, hidden, inject_every, adapter_mlp_ratio, max_instr_tokens, chat_template, context_prefix).