Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
50
77
190
Stefano Fiorucci
PRO
anakin87
Follow
VVVince's profile picture
GabPar's profile picture
umersheikh846's profile picture
189 followers
·
89 following
theanakin87
anakin87
stefano-fiorucci
AI & ML interests
Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️
Recent Activity
upvoted
an
article
3 days ago
ML Intern Takes Our Post-Training Internship Test
reacted
to
their
post
with ❤️
3 days ago
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play. 🧑🍳 Here's how: 1️⃣ Build a solid RL env with Verifiers (Prime Intellect) 2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env 3️⃣ SFT warm-up to teach format 4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves 5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies Done! Beats GPT-5-mini 🏆 --- 🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe 📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
posted
an
update
3 days ago
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play. 🧑🍳 Here's how: 1️⃣ Build a solid RL env with Verifiers (Prime Intellect) 2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env 3️⃣ SFT warm-up to teach format 4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves 5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies Done! Beats GPT-5-mini 🏆 --- 🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe 📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
View all activity
Organizations
anakin87
's datasets
12
Sort: Recently updated
anakin87/tictactoe-demo
Viewer
•
Updated
10 days ago
•
5
•
44
anakin87/tictactoe-filtered
Viewer
•
Updated
21 days ago
•
174
•
18
anakin87/tictactoe
Viewer
•
Updated
21 days ago
•
200
•
22
anakin87/Qwen3-0.6B-tuned-alphabet-sort-eval
Viewer
•
Updated
Sep 4, 2025
•
15
•
7
anakin87/Qwen3-0.6B-alphabet-sort-eval
Viewer
•
Updated
Sep 4, 2025
•
15
•
14
anakin87/events-scheduling
Viewer
•
Updated
Apr 26, 2025
•
600
•
72
•
2
anakin87/evol-dpo-ita-reranked
Viewer
•
Updated
Jan 14, 2025
•
19.8k
•
12
•
5
anakin87/gemma-vs-gemma-preferences
Viewer
•
Updated
Jan 14, 2025
•
24.7k
•
5
anakin87/fine-instructions-ita-70k
Viewer
•
Updated
Jan 14, 2025
•
69.9k
•
27
•
4
anakin87/FineTome-single-turn-dedup
Viewer
•
Updated
Jan 11, 2025
•
83.3k
•
18
anakin87/tulu-3-sft-mixture-with-language
Viewer
•
Updated
Dec 11, 2024
•
939k
•
49
anakin87/medrag-pubmed-chunk
Viewer
•
Updated
Feb 25, 2024
•
15.4k
•
26