Squeez mascot

Squeez-2B

Squeez-2B is a 2B parameter model fine-tuned from Qwen 3.5 2B for task-conditioned tool-output pruning in coding agents. Given a focused query and one raw tool observation, it extracts the smallest verbatim evidence block the agent should inspect next — removing 92% of input tokens while retaining 0.86 recall.

Tool output (500 lines) → Squeez → Relevant lines (30 lines) → Agent context
  • Outperforms zero-shot Qwen 3.5 35B A3B by +11 recall points
  • Returns verbatim lines only (no rewriting or summarization)
  • Works as CLI pipe, Python library, or vLLM server
  • Trained on 27 tool types from real SWE-bench workflows and synthetic multi-ecosystem outputs

Resources: Paper (coming soon) | Dataset | Code & CLI | Blog post

Results

Evaluated on 618 manually curated held-out examples spanning 27 tool types.

Model Prec. Recall F1 Compression
Squeez-2B 0.80 0.86 0.80 0.92
Qwen 3.5 35B A3B (zero-shot) 0.74 0.75 0.73 0.92
Kimi K2 (zero-shot) 0.61 0.53 0.68 0.94
Qwen 3.5 2B (untrained) 0.42 0.53 0.55 0.82

The fine-tuned 2B model is also the most precise system in the comparison, indicating it has learned a tool-specific extraction policy rather than relying on generic instruction following.

Qualitative patterns

Pattern Example Squeez-2B Baseline failure
Precise selection git_log, 21 lines — find one commit Selects the single correct entry Qwen 35B picks a plausible but wrong commit
Failure-block extraction Service log, 176 lines — two similar TLS errors Returns the correct 5-line block Qwen 35B picks the wrong TLS error (different timestamp)
Correct empty prediction docker_logs, 316 lines — no matching evidence Returns empty output Qwen 35B generates "No relevant lines found..."
Adjacent over-selection Build output, 110 lines — Dockerfile error Finds the right error + nearby noise Qwen 35B misses the Dockerfile error entirely

On the 59 negative examples in the test set, Squeez-2B correctly returns empty output 80% of the time. Qwen 35B returns empty only 7% of the time.

Quick Start

CLI (recommended)

pip install squeez

# With vLLM server
vllm serve KRLabsOrg/squeez-2b --dtype bfloat16 --max-model-len 16384
export SQUEEZ_SERVER_URL=http://localhost:8000/v1

pytest -q 2>&1 | squeez "find the failure block"
git log --oneline -50 | squeez "find the commit that changed CSRF handling"
cat src/auth/middleware.py | squeez "find the referer validation logic"

Python API

from squeez.inference.extractor import ToolOutputExtractor

# vLLM server
extractor = ToolOutputExtractor(base_url="http://localhost:8000/v1")

# Or local
extractor = ToolOutputExtractor(model_path="KRLabsOrg/squeez-2b")

filtered = extractor.extract(
    task="Find the failing test block",
    tool_output=raw_output,
)

With transformers directly

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "KRLabsOrg/squeez-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "system", "content": (
        "You prune verbose tool output for a coding agent. "
        "Given a focused extraction query and one tool output, return only the "
        "smallest verbatim evidence block(s) the agent should read next. "
        "Return the kept text inside <relevant_lines> tags. "
        "Do not rewrite, summarize, or invent lines."
    )},
    {"role": "user", "content": (
        "<query>\nFind the failing authentication test\n</query>\n"
        "<tool_output>\n"
        "PASSED tests/test_login.py::test_valid_credentials\n"
        "FAILED tests/test_login.py::test_token_refresh - AssertionError: expected 200 got 401\n"
        "PASSED tests/test_login.py::test_logout\n"
        "</tool_output>"
    )},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
# <relevant_lines>
# FAILED tests/test_login.py::test_token_refresh - AssertionError: expected 200 got 401
# </relevant_lines>

Input/Output Format

Input — Chat messages with system prompt:

  • System: extraction instructions (see above)
  • User: <query>{task}</query>\n<tool_output>{raw_output}</tool_output>

Output — Verbatim lines in XML tags:

<relevant_lines>
{only the lines that matter, copied verbatim}
</relevant_lines>

Supported Tool Types (27)

SWE-bench derived (14): read_file | grep | git_log | git_blame | git_diff | test_output | python | curl | pip_install | ls | lint_output | build_output | type_check | coverage

Synthetic multi-ecosystem (13): npm_build | tsc | npm_install | docker_logs | docker_build | make_cmake | kubectl | cargo_build | go_build | mvn_gradle | terraform | mypy_pyright | eslint

Training Details

Base model Qwen/Qwen3.5-2B
Method LoRA (r=16, alpha=32) via Unsloth
Training data 10,508 examples (SWE-bench + synthetic)
Epochs 3
Max sequence length 20,000 tokens
Learning rate 2e-4
Batch size 8 (32 effective with 4x gradient accumulation)
Hardware Single NVIDIA A100 80GB
Dataset KRLabsOrg/tool-output-extraction-swebench

Usage with Coding Agents

Add to your CLAUDE.md or agent system prompt:

When you invoke a shell command, pipe it through `squeez` and describe what you need.
Examples:
- bun test 2>&1 | squeez "did the tests pass?"
- git log --oneline -50 | squeez "find the commit that broke CSRF"
- cat src/auth/middleware.py | squeez "find the referer validation logic"

Limitations

  • Best on software engineering tool output; not designed for general-purpose summarization
  • Synthetic data generated by openai/gpt-oss-120b — may not fully reflect real-world distributions for all ecosystems
  • Evaluates single tool observations, not full agent trajectories
  • Max input: 20,000 tokens (training length); can be extended at serving time

License

Apache 2.0

Citation

@misc{kovacs2026squeez,
    title={Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents},
    author={Adam Kovacs},
    year={2026},
    url={https://github.com/KRLabsOrg/squeez}
}
Downloads last month
1,243
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KRLabsOrg/squeez-2b

Finetuned
Qwen/Qwen3.5-2B
Finetuned
(81)
this model
Quantizations
2 models

Dataset used to train KRLabsOrg/squeez-2b