BigScience Workshop

non-profit

https://bigscience.huggingface.co

bigscienceW

bigscience-workshop

Activity Feed

AI & ML interests

A one-year long research workshop on large language models: the Summer of Language Models 21 🌸

Recent Activity

odegiber authored a paper about 1 month ago

Scaling Low-Resource MT via Synthetic Data Generation with LLMs

odegiber authored a paper about 1 month ago

Open Machine Translation for Esperanto

israel authored a paper about 1 month ago

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation

View all activity

espejelomar

posted an update 3 days ago

Post

4671

Sharing WorldForge with @abdelstark

It's an open-source Python project for evaluating and replaying robotics and world-model workflows.

The useful part is not only calling a model. WorldForge records the run, validates action shapes, translates outputs into actions, and keeps replay artifacts you can inspect later.

The current demo uses LeRobot + LeWorldModel on PushT through the official loader:

stable_worldmodel.policy.AutoCostModel("pusht/lewm")

The harness also has replay-only paths for Cosmos-Policy and GR00T-style outputs, so you can inspect the provider contract from saved artifacts without keeping a GPU server online.

Try it:

pip install worldforge-ai
uv run --extra harness worldforge-harness --flow robotics-compare

Repo: https://github.com/AbdelStark/worldforge
Docs: https://abdelstark.github.io/worldforge/

Pre-1.0, MIT, and actively looking for contributors. Good areas:
- robotics provider adapters
- replay artifacts
- eval flows
- docs & first-run demos

Good first issues: https://github.com/AbdelStark/worldforge/contribute

If you're building robot policy evals or model adapters, would love a PR — or an issue describing what's missing.

w11wo

authored a paper 9 days ago

TrajDLM: Topology-Aware Block Diffusion Language Model for Trajectory Generation

Paper • 2605.10020 • Published 11 days ago • 2

odegiber

authored 2 papers about 1 month ago

Scaling Low-Resource MT via Synthetic Data Generation with LLMs

Paper • 2505.14423 • Published May 20, 2025 • 2

Open Machine Translation for Esperanto

Paper • 2603.29345 • Published Mar 31

christopher

in bigscience/bloom 2 months ago

[SPAM] Deleted

#289 opened 2 months ago by

sarthak-saxena

mmosbach

authored a paper 2 months ago

LLM2Vec-Gen: Generative Embeddings from Large Language Models

Paper • 2603.10913 • Published Mar 11 • 44

christopher

in bigscience/bloom 3 months ago

pretokenizer Regex issues?

#278 opened almost 2 years ago by

hpcpony

Test PR

#286 opened 3 months ago by

FIRSTACCOUNT69

Test discussion

#287 opened 3 months ago by

FIRSTACCOUNT69

Test discussion

#288 opened 3 months ago by

FIRSTACCOUNT69

albertvillanova

posted an update 3 months ago

Post

2729

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0