In a Training Loop 🔄

John Smith PRO

John6666

John6666cat

AI & ML interests

None yet

Recent Activity

reacted to oldhag88's post with 👀 about 4 hours ago

Try Meze-pic TRAIN for organizing images for machine learning: https://train.mezepic.com Features: => Edit captions, tags, and ratings directly in the website, or use API access for programmatic access. => Batch import images and captions, in multiple versions, with unlimited storage in the cloud. => 5,000 images free storage (0.5 GB free storage @ 100 KB/image) => Generate DINOv3 image embeddings with one click in the cloud => Attach custom metadata to your images. Use image embeddings to visualize and predict tags and scores across your datasets https://train.mezepic.com

reacted to anakin87's post with 😎 about 4 hours ago

🌀 Let LLMs wander - Engineering RL Environments Reinforcement Learning Environments are little worlds where models can act, get rewards, and learn. I've been exploring how to design them, figuring out what works and what doesn't. If you want to learn how to build them, I recorded a practical intro video. You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂 🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q --- 🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

upvoted an article about 4 hours ago

"Darwin-27B-Opus: Surpassing the Foundation Model Without Training"

View all activity

Organizations

reactedto oldhag88's post with 👀 about 4 hours ago

Post

Try Meze-pic TRAIN for organizing images for machine learning: https://train.mezepic.com

Features:
=> Edit captions, tags, and ratings directly in the website, or use API access for programmatic access.
=> Batch import images and captions, in multiple versions, with unlimited storage in the cloud.
=> 5,000 images free storage (0.5 GB free storage @ 100 KB/image)
=> Generate DINOv3 image embeddings with one click in the cloud
=> Attach custom metadata to your images. Use image embeddings to visualize and predict tags and scores across your datasets

https://train.mezepic.com

reactedto anakin87's post with 😎 about 4 hours ago

Post

2896

🌀 Let LLMs wander - Engineering RL Environments

Reinforcement Learning Environments are little worlds
where models can act, get rewards, and learn.

I've been exploring how to design them, figuring out what works and what doesn't.

If you want to learn how to build them, I recorded a practical intro video.

You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂

🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q

---

🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

reactedto SeaWolf-AI's post with 🚀 about 4 hours ago

Post

393

🧬 Darwin-27B-Opus: 86.9% on GPQA Diamond — World #5, Zero Training
We are excited to share Darwin-27B-Opus, a 27B model that achieved 86.9% on GPQA Diamond — ranking #5 globally on the HuggingFace leaderboard — without a single gradient update.

How? Darwin breeds pretrained models through evolutionary FFN crossbreeding. The father (Qwen3.5-27B) provides the reasoning architecture; the mother (Claude 4.6 Opus Reasoning Distilled) contributes structured chain-of-thought knowledge. CMA-ES automatically discovers optimal per-layer blending ratios — no human tuning required.

The result surpasses the original Qwen3.5-27B (85.5%), GLM-5.1 (744B, 86.2%), and Qwen3.5-122B (86.6%). A 27B model outperforming 744B — with zero training, zero data, one GPU, ~2 hours.

We also confirmed hybrid vigor on Korean benchmarks: Darwin-27B-KR (2nd generation offspring) surpassed both parents on CLIcK, winning 7 out of 11 categories. The evolutionary optimizer independently assigned 93% of FFN from the Korean-specialized mother while preserving 93% of attention from the reasoning-specialized father — autonomously validating our core principle: FFN carries knowledge, Attention carries reasoning.

📊 Public release: 10 days → 300+ community derivatives, 120K+ downloads.

🔗 Links:
Darwin-27B-Opus: FINAL-Bench/Darwin-27B-Opus
article: https://huggingface.co/blog/FINAL-Bench/darwin-gpqa
Darwin Family Collection: https://huggingface.co/collections/FINAL-Bench/darwin-family

If foundation models are raw ore, Darwin is the forge. We are just getting started. 🔥

reactedto omarkamali's post with 🚀 about 4 hours ago

Post

We got Qwen 3.5 to count Rs in Strawberry correctly! 🚨

Building on Sawtone, we’ve been testing a different way to feed language into an LLM to build the next generation of multilingual AI.

The usual setup gives the model tokenized text and asks it to perform various linguistic tasks. That works surprisingly well, until it doesn’t. Accents disappear. Words get mangled. Internal structure gets blurred away. And the cost of that gets higher once you move into multilingual and lower-resource settings.

So we tried adding a second path.

In addition to the normal text input, the model also receives Sawtone: a byte-level word representation that preserves how a word is written, how it sounds, and how it is structured.

Same LLM. Better interface.

In this proof of concept with Qwen 3.5 0.8B, that pushed our eval from 64% to 88%. The gains showed up exactly where tokenized models usually get shaky: diacritics, character order, exact spelling, and other form-sensitive behavior.

Sawtone itself is tokenizer-free, byte-level, and pre-trained across 507 languages.

Still early, but promising!

reactedto namanvats's post with ❤️ about 4 hours ago

Post

393

Ran a small controlled study on a frozen 40-task slice of Harbor Terminal-Bench-Pro, using the same model (minimax/minimax-m2.5) with two agent harnesses: Goose and OpenHands-SDK.

Under the base setup, reducing the turn budget from 100 to 60 pushed the two harnesses in opposite directions:

* Goose: 0.450 → 0.525
* OpenHands-SDK: 0.575 → 0.500

A tweaked 60-turn setup brought OpenHands-SDK back to 0.575. At their best, both harnesses reached the same 0.575 pass rate.

What surprised me most was the token profile: in this setup, the reported token usage for OpenHands-SDK was dramatically higher than Goose while converging to the same best score.

Same model, same task slice, different harness behavior under a tighter interaction budget.

Dataset:
namanvats/harbor-goose-openhands-benchmark

Code/configs:
https://github.com/namanvats/harbor-agent-ablation

reactedto prithivMLmods's post with 🔥 about 4 hours ago

Post

1849

Now, a collection of various compression schemes for Gemma 4 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. 👇

🔗Gemma 4 Compression(s)- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions
🔗Gemma 4 Uncensored [MAX] + Compression(s) - [`β ]- https://huggingface.co/collections/prithivMLmods/gemma-4-uncensored-max-compressions
🔗Gemma 4 Compression(s) - MoE- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions-moe
🔗Gemma-4 F32 GGUF- https://huggingface.co/collections/prithivMLmods/gemma-4-f32-gguf

🤗 > To learn more, visit the app page or the respective model pages.

reactedto MikeDoes's post with 👀 about 4 hours ago

Post

What happens when PII masking is treated as a trainable behavior, not just a detection task?

A new reinforcement learning environment tackles this question using a dataset derived from ai4privacy/open-pii-masking-500k-ai4privacy, transformed into a verifier-based training and evaluation setup.

Instead of evaluating PII masking as a one-off redaction step, this environment frames privacy as something models must consistently optimize for under feedback. The task requires models to correctly identify sensitive spans, replace them with [PII] tags, and comply with strict output formatting — all scored through explicit reward signals.

To make this realistic, the author filtered and normalized the dataset to focus on US-English examples, ensuring consistent masking targets while preserving the structural diversity needed to expose failure modes.

What's notable here isn't just the environment itself, but the shift in perspective.

By turning PII masking into a reinforcement learning problem, privacy stops being a static rule and becomes a behavior models are trained to maintain even under optimization pressure.

This is a strong example of how open privacy datasets can move beyond benchmarks and become infrastructure for new learning paradigms.

🔗 Explore the PII Masking RL environment on Prime Intellect:
https://app.primeintellect.ai/dashboard/environments/adamlucek/pii-masking

reactedto Nymbo's post with 😎🚀 about 4 hours ago

Post

Gemma-4-26B-A4B refusing to believe that web search results are real and is convinced that all search results are simulated or hallucinations. It also thinks that I, the user, might be simulated or hallucinating.

Just one more step till AGI at home 😎

reactedto Raiff1982's post with 🧠 about 4 hours ago

Post

Yes In Pre-Print!! https://www.researchsquare.com/article/rs-9362560/latest

1 reply

reactedto MikeDoes's post with 👀 about 4 hours ago

Post

reactedto obsxrver's post with 👀 about 4 hours ago

Post

174

DISCORD!
I made a discord! Come join. https://discord.gg/8wMr2mmaRF

reactedto Juanxi's post with 🚀🔥 1 day ago

Post

3173

📢 Awesome Multimodal Modeling

We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligence—from foundational fusion to native omni-models.

🔹 Taxonomy & Evolution:

Traditional Multimodal Learning – Foundational work on representation, fusion, and alignment.
Multimodal LLMs (MLLMs) – Architectures connecting vision encoders to LLMs for understanding.
Unified Multimodal Models (UMMs) – Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms.
Native Multimodal Models (NMMs) – Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws.
💡 Key Distinction:
UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.

🔗 Explore & Contribute: https://github.com/OpenEnvision-Lab/Awesome-Multimodal-Modeling

2 replies

reactedto eaddario's post with 👍 1 day ago

Post

eaddario/imatrix-calibration datasets updated to include Southeast Asian languages (Burmese, Filipino, Indonesian, Thai & Vietnamese).

reactedto kanaria007's post with 👀 2 days ago

Post

126

✅ Article highlight: *Rights Under Lightspeed* (art-60-061, v0.1)

TL;DR:
This article reframes “AI rights” as a *runtime governance problem*, not a metaphysical debate.

In a slow-light universe, centralized approval can become physically impossible. When latency and partitions block round-trip control, some node must be predelegated bounded local discretion. In SI terms, those “rights” are *bounded autonomy envelopes*: explicit effect permissions with scope, gates, budgets, auditability, and rollback.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• moves the AI-rights discussion from sentiment to system design
• explains why physics can force local autonomy under high RTT or partitions
• treats rights and governance as duals: *discretion on one side, proof/rollback on the other*
• gives a practical ladder from proposal-only systems to governed autonomous SI nodes

What’s inside:
• “rights” as *operational rights / discretion budgets*
• mapping from rights tiers to *SI-Core conformance + RML maturity*
• deep-space latency as the clearest stress case
• *autonomy envelopes* as typed, scoped, rate-limited, auditable permission objects
• a migration path from *LLM wrappers* to governed autonomous nodes

Key idea:
In distributed worlds, “AI rights” stop being a moral trophy question and become an engineering question:

*What discretion must a node hold to do its job under physics, and what governance makes that safe?*

reactedto AINovice2005's post with 🚀 2 days ago

Post

115

I've built a system to make open-source contributions easier to understand across repositories.

It:

aggregates merged external PRs (reviewed by maintainers)
structures them into a single contributions.md
adds a lightweight AI layer to query patterns and impact

The idea is to move from scattered PRs to a readable changelog of work.

Read about it: https://medium.com/@paragekbote23/from-commits-to-impact-building-an-automated-changelog-for-open-source-contributions-20cdfebcee58

reactedto satpalsr's post with 🤗 2 days ago

Post

OpenAI is hiring for SLAM Engineers!
And open-source shouldn't lag behind.

It's pretty hard and necessary problem required to be solved for bringing generalisable robots in real-world.

We are pushing out first deep down & will be open-sourcing stuff in the next releases. Hope everyone is ready! Cheers to HF & more hugs.

Find us at https://x.com/fpv_labs/status/2042585804162371713

reactedto MichaelHallArm's post with 🚀 2 days ago

Post

122

Running Hugging Face Spaces on Arm64?

In this live code-along with Docker, we’ll walk through how to assess compatibility, spot architecture-specific issues, and migrate a real Space using the Docker MCP Toolkit, the Arm MCP Server, and GitHub Copilot.

We’ll use the ACE-Step v1.5 model as a working example.

Watch live: https://www.youtube.com/live/rcmmBi-qosA?si=Hn9TfpYw7_XlWWEY

Arm

reactedto eaddario's post with 👍 2 days ago

Post

Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.5-4B and Qwen/Qwen3.5-9B

Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.

Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.

Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards

eaddario/Qwen3.5-4B-GGUF
eaddario/Qwen3.5-9B-GGUF

2 replies

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity