Features: => Edit captions, tags, and ratings directly in the website, or use API access for programmatic access. => Batch import images and captions, in multiple versions, with unlimited storage in the cloud. => 5,000 images free storage (0.5 GB free storage @ 100 KB/image) => Generate DINOv3 image embeddings with one click in the cloud => Attach custom metadata to your images. Use image embeddings to visualize and predict tags and scores across your datasets
𧬠Darwin-27B-Opus: 86.9% on GPQA Diamond β World #5, Zero Training We are excited to share Darwin-27B-Opus, a 27B model that achieved 86.9% on GPQA Diamond β ranking #5 globally on the HuggingFace leaderboard β without a single gradient update.
How? Darwin breeds pretrained models through evolutionary FFN crossbreeding. The father (Qwen3.5-27B) provides the reasoning architecture; the mother (Claude 4.6 Opus Reasoning Distilled) contributes structured chain-of-thought knowledge. CMA-ES automatically discovers optimal per-layer blending ratios β no human tuning required.
The result surpasses the original Qwen3.5-27B (85.5%), GLM-5.1 (744B, 86.2%), and Qwen3.5-122B (86.6%). A 27B model outperforming 744B β with zero training, zero data, one GPU, ~2 hours.
We also confirmed hybrid vigor on Korean benchmarks: Darwin-27B-KR (2nd generation offspring) surpassed both parents on CLIcK, winning 7 out of 11 categories. The evolutionary optimizer independently assigned 93% of FFN from the Korean-specialized mother while preserving 93% of attention from the reasoning-specialized father β autonomously validating our core principle: FFN carries knowledge, Attention carries reasoning.
π Public release: 10 days β 300+ community derivatives, 120K+ downloads.
We got Qwen 3.5 to count Rs in Strawberry correctly! π¨
Building on Sawtone, weβve been testing a different way to feed language into an LLM to build the next generation of multilingual AI.
The usual setup gives the model tokenized text and asks it to perform various linguistic tasks. That works surprisingly well, until it doesnβt. Accents disappear. Words get mangled. Internal structure gets blurred away. And the cost of that gets higher once you move into multilingual and lower-resource settings.
So we tried adding a second path.
In addition to the normal text input, the model also receives Sawtone: a byte-level word representation that preserves how a word is written, how it sounds, and how it is structured.
Same LLM. Better interface.
In this proof of concept with Qwen 3.5 0.8B, that pushed our eval from 64% to 88%. The gains showed up exactly where tokenized models usually get shaky: diacritics, character order, exact spelling, and other form-sensitive behavior.
Sawtone itself is tokenizer-free, byte-level, and pre-trained across 507 languages.
Still early, but promising!
reactedtonamanvats'spost with β€οΈabout 4 hours ago
Ran a small controlled study on a frozen 40-task slice of Harbor Terminal-Bench-Pro, using the same model (minimax/minimax-m2.5) with two agent harnesses: Goose and OpenHands-SDK.
Under the base setup, reducing the turn budget from 100 to 60 pushed the two harnesses in opposite directions:
A tweaked 60-turn setup brought OpenHands-SDK back to 0.575. At their best, both harnesses reached the same 0.575 pass rate.
What surprised me most was the token profile: in this setup, the reported token usage for OpenHands-SDK was dramatically higher than Goose while converging to the same best score.
Same model, same task slice, different harness behavior under a tighter interaction budget.
Now, a collection of various compression schemes for Gemma 4 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. π
What happens when PII masking is treated as a trainable behavior, not just a detection task?
A new reinforcement learning environment tackles this question using a dataset derived from ai4privacy/open-pii-masking-500k-ai4privacy, transformed into a verifier-based training and evaluation setup.
Instead of evaluating PII masking as a one-off redaction step, this environment frames privacy as something models must consistently optimize for under feedback. The task requires models to correctly identify sensitive spans, replace them with [PII] tags, and comply with strict output formatting β all scored through explicit reward signals.
To make this realistic, the author filtered and normalized the dataset to focus on US-English examples, ensuring consistent masking targets while preserving the structural diversity needed to expose failure modes.
What's notable here isn't just the environment itself, but the shift in perspective.
By turning PII masking into a reinforcement learning problem, privacy stops being a static rule and becomes a behavior models are trained to maintain even under optimization pressure.
This is a strong example of how open privacy datasets can move beyond benchmarks and become infrastructure for new learning paradigms.
Gemma-4-26B-A4B refusing to believe that web search results are real and is convinced that all search results are simulated or hallucinations. It also thinks that I, the user, might be simulated or hallucinating.
Just one more step till AGI at home π
reactedtoRaiff1982'spost with π§ about 4 hours ago
What happens when PII masking is treated as a trainable behavior, not just a detection task?
A new reinforcement learning environment tackles this question using a dataset derived from ai4privacy/open-pii-masking-500k-ai4privacy, transformed into a verifier-based training and evaluation setup.
Instead of evaluating PII masking as a one-off redaction step, this environment frames privacy as something models must consistently optimize for under feedback. The task requires models to correctly identify sensitive spans, replace them with [PII] tags, and comply with strict output formatting β all scored through explicit reward signals.
To make this realistic, the author filtered and normalized the dataset to focus on US-English examples, ensuring consistent masking targets while preserving the structural diversity needed to expose failure modes.
What's notable here isn't just the environment itself, but the shift in perspective.
By turning PII masking into a reinforcement learning problem, privacy stops being a static rule and becomes a behavior models are trained to maintain even under optimization pressure.
This is a strong example of how open privacy datasets can move beyond benchmarks and become infrastructure for new learning paradigms.
We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligenceβfrom foundational fusion to native omni-models.
πΉ Taxonomy & Evolution:
Traditional Multimodal Learning β Foundational work on representation, fusion, and alignment. Multimodal LLMs (MLLMs) β Architectures connecting vision encoders to LLMs for understanding. Unified Multimodal Models (UMMs) β Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms. Native Multimodal Models (NMMs) β Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws. π‘ Key Distinction: UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.
β Article highlight: *Rights Under Lightspeed* (art-60-061, v0.1)
TL;DR: This article reframes βAI rightsβ as a *runtime governance problem*, not a metaphysical debate.
In a slow-light universe, centralized approval can become physically impossible. When latency and partitions block round-trip control, some node must be predelegated bounded local discretion. In SI terms, those βrightsβ are *bounded autonomy envelopes*: explicit effect permissions with scope, gates, budgets, auditability, and rollback.
Why it matters: β’ moves the AI-rights discussion from sentiment to system design β’ explains why physics can force local autonomy under high RTT or partitions β’ treats rights and governance as duals: *discretion on one side, proof/rollback on the other* β’ gives a practical ladder from proposal-only systems to governed autonomous SI nodes
Whatβs inside: β’ βrightsβ as *operational rights / discretion budgets* β’ mapping from rights tiers to *SI-Core conformance + RML maturity* β’ deep-space latency as the clearest stress case β’ *autonomy envelopes* as typed, scoped, rate-limited, auditable permission objects β’ a migration path from *LLM wrappers* to governed autonomous nodes
Key idea: In distributed worlds, βAI rightsβ stop being a moral trophy question and become an engineering question:
*What discretion must a node hold to do its job under physics, and what governance makes that safe?*
I've built a system to make open-source contributions easier to understand across repositories.
It:
aggregates merged external PRs (reviewed by maintainers) structures them into a single contributions.md adds a lightweight AI layer to query patterns and impact
The idea is to move from scattered PRs to a readable changelog of work.
Running Hugging Face Spaces on Arm64? In this live code-along with Docker, weβll walk through how to assess compatibility, spot architecture-specific issues, and migrate a real Space using the Docker MCP Toolkit, the Arm MCP Server, and GitHub Copilot. Weβll use the ACE-Step v1.5 model as a working example. Watch live: https://www.youtube.com/live/rcmmBi-qosA?si=Hn9TfpYw7_XlWWEY
Experimental global target bitsβperβweight quantization of Qwen/Qwen3.5-4B and Qwen/Qwen3.5-9B
Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.
Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.
Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards