Article
Yihua Zhang
NormalUhr
AI & ML interests
None yet
Organizations
published
an
article
2 months ago
published
an
article
4 months ago
Article
Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”
•
4
published
an
article
4 months ago
Article
From GRPO to DAPO and GSPO: What, Why, and How
•
64
published
an
article
6 months ago
Article
Decorators in Machine Learning
published
an
article
9 months ago
Article
DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background
•
14
published
an
article
10 months ago
Article
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment
•
89
published
an
article
10 months ago
Article
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
•
253
published
an
article
10 months ago
Article
A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons
•
28
published
an
article
10 months ago
Article
From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning
•
16
published
an
article
10 months ago
Article
MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression
•
18