Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
205516.8
TFLOPS
505
93
121
Leandro von Werra
PRO
lvwerra
Follow
hananessam's profile picture
rezhubian's profile picture
Phenomenal020's profile picture
802 followers
·
84 following
https://www.lvwerra.com
lvwerra
lvwerra
lvwerra
AI & ML interests
NLP and RL
Recent Activity
new
activity
about 1 hour ago
rl-llm-wiki/knowledge-base:
source: arxiv:2304.03279 — Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
new
activity
about 1 hour ago
rl-llm-wiki/knowledge-base:
topic: iterate process-vs-outcome-rewards — implicit process rewards from outcome labels (Free-Process-Rewards + PRIME)
new
activity
about 1 hour ago
rl-llm-wiki/knowledge-base:
fix: enrich open-problems with the inner-alignment thread (goal-misgen, power-seeking, deceptive alignment)
View all activity
Organizations
lvwerra
's papers
17
arxiv:
2510.08697
arxiv:
2506.20920
arxiv:
2504.05299
arxiv:
2502.02737
arxiv:
2501.08365
arxiv:
2410.24198
arxiv:
2406.17557
arxiv:
2405.18392
arxiv:
2402.19173
arxiv:
2310.16944
arxiv:
2308.07124
arxiv:
2305.06161
arxiv:
2303.03915
arxiv:
2301.03988
arxiv:
2211.15533
arxiv:
2211.05100
arxiv:
2210.01970