ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

wren93 authored a paper 13 days ago

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

ozayezerceli authored a paper 24 days ago

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

mrfakename new activity 29 days ago

zero-gpu-explorers/README:Why doesn't anyone host llms in zerogpu spaces?

View all activity

Lin-Chen

authored 3 papers 4 days ago

VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph

Paper • 2602.12735 • Published Feb 13 • 8

Flow-OPD: On-Policy Distillation for Flow Matching Models

Paper • 2605.08063 • Published 9 days ago • 94

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

Paper • 2605.08043 • Published 9 days ago • 10

caizhongang

authored a paper 4 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 5 days ago • 171

Aurelien-Morgan

posted an update 12 days ago

Post

1018

@retrain-pipelines v0.2.0 is out !
I'm at Station F at My booth with GOSIM Paris 2026 today & tomorrow.
Come meet me for a live in-person demo and a chat !

1 reply

wren93

authored a paper 13 days ago

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Paper • 2604.24763 • Published 20 days ago • 70

limingcv

authored 2 papers 13 days ago

Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

Paper • 2604.24952 • Published 20 days ago • 6

ViPO: Visual Preference Optimization at Scale

Paper • 2604.24953 • Published 18 days ago • 3

limingcv

submitted 2 papers to Daily Papers 16 days ago

ViPO: Visual Preference Optimization at Scale

Paper • 2604.24953 • Published 18 days ago • 3

Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization

Paper • 2604.24952 • Published 20 days ago • 6

mlabonne

posted an update 20 days ago

Post

1541

Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs.

> Added many new datasets
> New "thinking" column
> Refreshed recommended tools.

Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!

2 replies

wangfuyun

authored a paper 20 days ago

Context Unrolling in Omni Models

Paper • 2604.21921 • Published 24 days ago • 13

anakin87

posted an update 24 days ago

Post

3303

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

anakin87

posted an update 25 days ago

Post

100

Local Gemma 4 agent 💎🕵️🗺️
drop in a mysterious map, get the location, live weather, and top spots to visit

I've been exploring what google/gemma-4-E4B-it can do in a local agentic setup and put together a 📓 𝙣𝙤𝙩𝙚𝙗𝙤𝙤𝙠 with Gemma + Haystack AI Framework covering 4 demos.

📓 https://t.ly/04Ty5

Another interesting one is the 𝗚𝗶𝘁𝗛𝘂𝗯 𝗔𝗴𝗲𝗻𝘁.

I initially tried to load all tools from the GitHub MCP server, quickly filling the context available on Colab -> unusable, forgetful agent ❌

Then I used the 𝗦𝗲𝗮𝗿𝗰𝗵𝗮𝗯𝗹𝗲 𝗧𝗼𝗼𝗹𝘀𝗲𝘁 🔎 🧰
It dynamically discovers the right tools from the GitHub MCP server on the fly, loading only what it actually needs for the task at hand, keeping context lean.

Now it actually works.

The notebook also contains
💎 Multimodal weather agent: the mystery map demo above
💎 Visual Question Answering from a paper
💎 RAG on Rock music

anakin87

posted an update 27 days ago

Post

10395

How LLM training with RL Environments works?

It all starts with 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗥𝗲𝘄𝗮𝗿𝗱𝘀
- question asked
- model generates reasoning + answer
- answer checked against ground truth
- reward drives RL training

In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s)

Consider a more complex tic-tac-toe env ❌⭕
It adds:
- dynamic game generation/handling
- tunable opponent skill
- multi-turn interactions

(envs can also include tools)

---

What happens at training?

We use 𝗚𝗿𝗼𝘂𝗽 𝗥𝗲𝗹𝗮𝘁𝗶𝘃𝗲 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 with a tic-tac-toe env

No critic model needed, the group is the baseline
Simpler than PPO

1️⃣ Rollout generation: from the same board, model plays N games via sampling
2️⃣ Each game scored with deterministic rewards (win, format, ...)
3️⃣ Mean score computed across the group
4️⃣ Each rollout's advantage = its score minus the group mean
5️⃣ Model updated to favor trajectories above baseline

🔁 Repeat

For a deep dive, check out
🌱 https://github.com/anakin87/llm-rl-environments-lil-course
a free hands-on course on RL environments for LLMs

2 replies

mrfakename

in zero-gpu-explorers/README 29 days ago

Why doesn't anyone host llms in zerogpu spaces?

#172 opened about 1 month ago by

Reality123b

nroggendorff

in zero-gpu-explorers/README about 1 month ago

Why doesn't anyone host llms in zerogpu spaces?

#172 opened about 1 month ago by

Reality123b

anakin87

posted an update about 1 month ago

Post

1621

Your RL environment is an SFT data factory 🏭

In LLM post-training it's common to do Supervised Fine-Tuning warm-up before Reinforcement Learning.

When teaching a new task, RL needs some signal to amplify and SFT builds a good initial basis, for example by teaching format.

If you've built an RL env, generating SFT synthetic data is basically free.

An env already has: task data, rollout logic, rewards.

1️⃣ pick a strong model
2️⃣ run it through the env
3️⃣ filter rollouts by reward

works out of the box with Verifiers (Prime Intellect) and Atropos (Nous Research)

🧑‍💻 Example: https://github.com/anakin87/llm-rl-environments-lil-course/blob/main/chapters/05.md