Squeez-2B
Squeez-2B is a 2B parameter model fine-tuned from Qwen 3.5 2B for task-conditioned tool-output pruning in coding agents. Given a focused query and one raw tool observation, it extracts the smallest verbatim evidence block the agent should inspect next — removing 92% of input tokens while retaining 0.86 recall.
Tool output (500 lines) → Squeez → Relevant lines (30 lines) → Agent context
- Outperforms zero-shot Qwen 3.5 35B A3B by +11 recall points
- Returns verbatim lines only (no rewriting or summarization)
- Works as CLI pipe, Python library, or vLLM server
- Trained on 27 tool types from real SWE-bench workflows and synthetic multi-ecosystem outputs
Resources: Paper (coming soon) | Dataset | Code & CLI | Blog post
Results
Evaluated on 618 manually curated held-out examples spanning 27 tool types.
| Model | Prec. | Recall | F1 | Compression |
|---|---|---|---|---|
| Squeez-2B | 0.80 | 0.86 | 0.80 | 0.92 |
| Qwen 3.5 35B A3B (zero-shot) | 0.74 | 0.75 | 0.73 | 0.92 |
| Kimi K2 (zero-shot) | 0.61 | 0.53 | 0.68 | 0.94 |
| Qwen 3.5 2B (untrained) | 0.42 | 0.53 | 0.55 | 0.82 |
The fine-tuned 2B model is also the most precise system in the comparison, indicating it has learned a tool-specific extraction policy rather than relying on generic instruction following.
Qualitative patterns
| Pattern | Example | Squeez-2B | Baseline failure |
|---|---|---|---|
| Precise selection | git_log, 21 lines — find one commit |
Selects the single correct entry | Qwen 35B picks a plausible but wrong commit |
| Failure-block extraction | Service log, 176 lines — two similar TLS errors | Returns the correct 5-line block | Qwen 35B picks the wrong TLS error (different timestamp) |
| Correct empty prediction | docker_logs, 316 lines — no matching evidence |
Returns empty output | Qwen 35B generates "No relevant lines found..." |
| Adjacent over-selection | Build output, 110 lines — Dockerfile error | Finds the right error + nearby noise | Qwen 35B misses the Dockerfile error entirely |
On the 59 negative examples in the test set, Squeez-2B correctly returns empty output 80% of the time. Qwen 35B returns empty only 7% of the time.
Quick Start
CLI (recommended)
pip install squeez
# With vLLM server
vllm serve KRLabsOrg/squeez-2b --dtype bfloat16 --max-model-len 16384
export SQUEEZ_SERVER_URL=http://localhost:8000/v1
pytest -q 2>&1 | squeez "find the failure block"
git log --oneline -50 | squeez "find the commit that changed CSRF handling"
cat src/auth/middleware.py | squeez "find the referer validation logic"
Python API
from squeez.inference.extractor import ToolOutputExtractor
# vLLM server
extractor = ToolOutputExtractor(base_url="http://localhost:8000/v1")
# Or local
extractor = ToolOutputExtractor(model_path="KRLabsOrg/squeez-2b")
filtered = extractor.extract(
task="Find the failing test block",
tool_output=raw_output,
)
With transformers directly
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "KRLabsOrg/squeez-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": (
"You prune verbose tool output for a coding agent. "
"Given a focused extraction query and one tool output, return only the "
"smallest verbatim evidence block(s) the agent should read next. "
"Return the kept text inside <relevant_lines> tags. "
"Do not rewrite, summarize, or invent lines."
)},
{"role": "user", "content": (
"<query>\nFind the failing authentication test\n</query>\n"
"<tool_output>\n"
"PASSED tests/test_login.py::test_valid_credentials\n"
"FAILED tests/test_login.py::test_token_refresh - AssertionError: expected 200 got 401\n"
"PASSED tests/test_login.py::test_logout\n"
"</tool_output>"
)},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# <relevant_lines>
# FAILED tests/test_login.py::test_token_refresh - AssertionError: expected 200 got 401
# </relevant_lines>
Input/Output Format
Input — Chat messages with system prompt:
- System: extraction instructions (see above)
- User:
<query>{task}</query>\n<tool_output>{raw_output}</tool_output>
Output — Verbatim lines in XML tags:
<relevant_lines>
{only the lines that matter, copied verbatim}
</relevant_lines>
Supported Tool Types (27)
SWE-bench derived (14): read_file | grep | git_log | git_blame | git_diff | test_output | python | curl | pip_install | ls | lint_output | build_output | type_check | coverage
Synthetic multi-ecosystem (13): npm_build | tsc | npm_install | docker_logs | docker_build | make_cmake | kubectl | cargo_build | go_build | mvn_gradle | terraform | mypy_pyright | eslint
Training Details
| Base model | Qwen/Qwen3.5-2B |
| Method | LoRA (r=16, alpha=32) via Unsloth |
| Training data | 10,508 examples (SWE-bench + synthetic) |
| Epochs | 3 |
| Max sequence length | 20,000 tokens |
| Learning rate | 2e-4 |
| Batch size | 8 (32 effective with 4x gradient accumulation) |
| Hardware | Single NVIDIA A100 80GB |
| Dataset | KRLabsOrg/tool-output-extraction-swebench |
Usage with Coding Agents
Add to your CLAUDE.md or agent system prompt:
When you invoke a shell command, pipe it through `squeez` and describe what you need.
Examples:
- bun test 2>&1 | squeez "did the tests pass?"
- git log --oneline -50 | squeez "find the commit that broke CSRF"
- cat src/auth/middleware.py | squeez "find the referer validation logic"
Limitations
- Best on software engineering tool output; not designed for general-purpose summarization
- Synthetic data generated by
openai/gpt-oss-120b— may not fully reflect real-world distributions for all ecosystems - Evaluates single tool observations, not full agent trajectories
- Max input: 20,000 tokens (training length); can be extended at serving time
License
Apache 2.0
Citation
@misc{kovacs2026squeez,
title={Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents},
author={Adam Kovacs},
year={2026},
url={https://github.com/KRLabsOrg/squeez}
}
- Downloads last month
- 1,243