Hugging Face Agents Course

Team

university

https://bit.ly/hf-learn-agents

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

Jofthomas updated a dataset about 1 hour ago

agents-course/unit4-students-scores

burtenshaw updated a dataset about 3 hours ago

agents-course/certificates

sergiopaniego updated a dataset about 21 hours ago

agents-course/final-certificates

View all activity

Jofthomas

updated a dataset about 1 hour ago

agents-course/unit4-students-scores

Viewer • Updated about 1 hour ago • 9.5k • 12.7k • 17

burtenshaw

updated a dataset about 3 hours ago

agents-course/certificates

Preview • Updated 4 minutes ago • 13.4k • 88

sergiopaniego

updated 2 datasets about 21 hours ago

agents-course/final-certificates

Viewer • Updated about 9 hours ago • 8 • 1.89k • 17

agents-course/course-certificates-of-excellence

Viewer • Updated about 9 hours ago • 5.09k • 1.01k • 12

sergiopaniego

posted an update 3 days ago

Post

2500

Simon Willison (@simonw ) has asked every new model to draw a pelican riding a bicycle for some time now

you look at the drawing and you know. but there is no number, so nothing can train against it, no?

I turned this idea into an rl env in OpenEnv. now, you can eval any model against it, and train against it with TRL

read the details!🤓

https://huggingface.co/blog/sergiopaniego/pelican-env-openenv

2 replies

sergiopaniego

posted an update 4 days ago

Post

130

yesterday we had Class 3 of the Training Agents live series

we went through how GRPO works in depth and applied it to real experiments with TRL

sharing the resources in case you want to dig in, enjoy!

🎥 session recording: https://www.youtube.com/watch?v=ztdTed5egrM

📄 slides with links: https://docs.google.com/presentation/d/19v5_HR5B-1CPHuoZ-RXjhgBrGFXGnBNd6TfLhsXWT1c/edit?usp=sharing

sergiopaniego

posted an update 6 days ago

Post

2843

quick reminder! 🚨

tomorrow (Tuesday, July 28), we're back with Class 3 of the Training Agents live series

🧠 what: reinforcement learning for training agents (GRPO): how it works, how to implement it in TRL, and end-to-end examples
🗓️ when: Tuesday, July 28 - 🕔 5:00 PM CEST / 8:30 PM IST
📍 where: Live on @huggingface 's X, YouTube, and LinkedIn

live: https://www.youtube.com/watch?v=ztdTed5egrM

class 1: https://x.com/SergioPaniego/status/2069382207618379813
class 2: https://x.com/SergioPaniego/status/2075180665184686187

1 reply

sergiopaniego

posted an update 9 days ago

Post

180

you can now train your own coding agents with trl + openenv, starting with opencode

we just added end-to-end support for training agent harnesses:

> TRL: a loop-owning training path (AsyncGRPOTrainer + HarnessRolloutWorker) that launches the agent in an OpenEnv session, reads back its trace, reconstructs the training samples, and trains with AsyncGRPO
> OpenEnv: the OpenCode harness environment plus a transparent proxy that forwards the agent's model calls and records each turn's token ids and logprobs

you train the actual opencode agent as is, it runs its own loop and tools and the policy learns from the exact tokens it produced

we're shipping a self-contained example: local subprocess sandbox, DeepCoder problems, validated on Qwen3-8B.

> example: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/opencode.py
> docs: https://huggingface.co/docs/trl/main/openenv

and we're working actively on both sides so expect more 🤓

1 reply

sergiopaniego

posted an update 10 days ago

Post

1490

you can train DiffusionGemma (a block-diffusion LLM) in TRL! and we're sharing an example for it

TRL trainers are made to be easily extended and adapted to different real-world use cases.

in this one, with a single method overridden in SFTTrainer (compute_loss), you can train this model

> example: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_diffusion_gemma.py

sergiopaniego

posted an update 11 days ago

Post

213

join us next Tuesday, July 28, for Class 3 of the Training Agents live series!

we'll dive into reinforcement learning for agent training, covering the intuition behind GRPO, how it works, and how to implement it in TRL with practical, e2e examples

see you there 🤠

live: https://www.youtube.com/live/ztdTed5egrM

> in case you missed class 1:
https://x.com/SergioPaniego/status/2069382207618379813
> and in case you missed class 2: https://x.com/SergioPaniego/status/2075180665184686187

sergiopaniego

posted an update 25 days ago

Post

7724

Frontier models use distillation as a step of their post-training pipelines.

In 2026 it has three jobs: compress a big model into a small one, merge RL experts into a single model, and let a model teach itself.

I wrote up which frontier models use each one and how: https://huggingface.co/blog/sergiopaniego/distillation-2026

It pairs with Class 2 of the Training an Agent series Ben and I are doing, where we teach these techniques hands-on with TRL!

3 replies

sergiopaniego

posted an update about 1 month ago

Post

369

TRL v1.7.0 is out‼️

+ continuous batching makes GRPO and RLOO 1.25x faster at -16 GB
+ proper MoE post-training across GRPO/RLOO/AsyncGRPO
+ new GMPO trainer
+ AsyncGRPO weight sync + padding-free
+ more

https://github.com/huggingface/trl/releases/tag/v1.7.0

wrote a small article about the continuous batching for GRPO feature

https://huggingface.co/blog/sergiopaniego/cb-trl-grpo

sergiopaniego

posted an update about 1 month ago

Post

357

Continuous batching just landed in TRL for GRPO!

At 64 generations it runs faster and uses less VRAM than plain generate, no vLLM needed

How it works and when to reach for it, below

https://huggingface.co/blog/sergiopaniego/cb-trl-grpo