screenpipe-pii-image-redactor
A screenpipe project. The image-modality companion to
screenpipe/pii-redactor.
screenpipe's own image PII detector — it finds and boxes PII regions directly in a screenshot, for the surfaces an AI agent sees a user's machine through: screen captures, computer-use frames, and app UIs (chat, terminals, settings panes, CRMs, browsers, password managers). It is screenpipe's own model, trained in-house for this task. Returns pixel-space bounding boxes for 12 canonical PII categories. ~109 MB ONNX; the same file runs on macOS / Windows / Linux (CoreML / DirectML / CUDA / CPU, selected at load time — no GPU vendor SDKs required at the consumer).
License: CC BY-NC 4.0 (non-commercial). For commercial use — production redaction, SaaS / API embedding, AI-agent privacy middleware, custom fine-tunes — contact [email protected]. See
LICENSE.
Headline numbers
On ScreenLeak, PII-bearing screenshots, region match at IoU ≥ 0.30:
| Model | Region zero-leak | Oversmash |
|---|---|---|
| this model ⭐ local | 98.9% | 0.0% |
| Gemini 3.1 Pro | 4.2% | 9.7% |
| GPT-5.5 | 3.2% | 22.6% |
| Google Cloud DLP | 2.6% | 19.4% |
| Claude Opus 4.7 | 2.1% | 35.5% |
| Microsoft Presidio | 0.5% | 48.4% |
Frontier vision models can name what they see, but can't draw boxes tight enough to count at IoU 0.30; a small specialized detector decisively separates. ~120 ms p50 on Apple Silicon (CoreML). Full methodology + confidence intervals: github.com/screenpipe/screenleak. Try it in your browser: screenpipe.github.io/screenleak/demo.
In-distribution caveat. The headline is measured on a held-out split matched to the model's training conditions — an upper bound, not a real-screen guarantee. It is strongest on clean, standard app UIs; unusual or low-quality screens may be missed or over-boxed.
What it does
Per-image object detection → [(bbox, label, score)], where each
detection is a region classified into one of the 12 canonical categories
shared with screenpipe/pii-redactor:
private_person, private_email, private_phone, private_address,
private_url, private_company, private_repo, private_handle,
private_channel, private_id, private_date, secret
secret covers passwords, API keys, JWTs, DB connection strings,
PRIVATE-KEY block markers, etc.
Inference
# pip install onnxruntime pillow numpy
import numpy as np, onnxruntime as ort
from PIL import Image, ImageDraw
CLASSES = ["private_person","private_email","private_phone","private_address",
"private_url","private_company","private_repo","private_handle",
"private_channel","private_id","private_date","secret"]
SIZE, THRESH = 512, 0.30
sess = ort.InferenceSession(
"rfdetr_v11.onnx",
providers=["CoreMLExecutionProvider", "CPUExecutionProvider"],
)
img = Image.open("screenshot.png").convert("RGB"); W, H = img.size
arr = np.asarray(img.resize((SIZE, SIZE), Image.BILINEAR), np.float32) / 255.0
arr = ((arr - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]).transpose(2, 0, 1)[None].astype(np.float32)
boxes, logits = sess.run(None, {sess.get_inputs()[0].name: arr})
boxes, logits = boxes[0], logits[0] # (Q,4) cxcywh normalized · (Q,13)
probs = 1.0 / (1.0 + np.exp(-logits[:, :12])) # per-class sigmoid (NOT softmax)
score = probs.max(1)
draw = ImageDraw.Draw(img) # redact = draw opaque boxes
for q in np.where(score >= THRESH)[0]:
cx, cy, bw, bh = boxes[q]
x1, y1 = (cx - bw / 2) * W, (cy - bh / 2) * H
draw.rectangle([x1, y1, x1 + bw * W, y1 + bh * H], fill=(0, 0, 0))
img.save("screenshot_redacted.png")
Use solid black, not blur — blur is reversible by super-resolution
attacks; opaque rectangles aren't. Output is 300 detection queries × 13
channels (12 PII classes + a no-object channel), per-class sigmoid.
Latest weights: rfdetr_v11.onnx (512×512 input).
Limitations
- In-distribution headline. 98.9% is the held-out ceiling under matched conditions; real, unusual screens will score lower.
- It's a localizer — don't filter on its class label. It reliably finds PII regions, but its per-region category prediction is not reliable on out-of-distribution screens. Redact every detected region rather than filtering by predicted class.
- Synthetic training data only — no real user data. Validate on your screens before deploying.
- English / Latin-script evaluated; CJK / Arabic / Cyrillic not — run a locale-specific eval first.
- Not a security boundary. Built for honest-user privacy; an adversary who knows the detector exists can craft layouts to evade it (handwritten, embedded-image, or occluded PII).
License
CC BY-NC 4.0 — non-commercial use only. See NOTICE
for third-party component attributions.
For commercial licensing (production deployment, redistribution rights, SaaS / API embedding, custom fine-tunes for your domain): [email protected].
Citation
@misc{screenpipe-pii-image-redactor-2026,
title = {screenpipe-pii-image-redactor: a screen-PII detector for
accessibility-aware agents},
author = {{screenpipe}},
year = {2026},
url = {https://huggingface.co/screenpipe/pii-image-redactor}
}