NormalUhr (Yihua Zhang)

published an article 9 months ago

Article

A Role Shift for AI Infra: From Foundational Support to a Core Engine of Innovation

NormalUhr

•

Oct 3, 2025

• 1

published an article 11 months ago

Article

Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”

NormalUhr

•

Aug 11, 2025

• 13

published an article 11 months ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

NormalUhr

•

Aug 9, 2025

• 128

published an article about 1 year ago

Article

Decorators in Machine Learning

NormalUhr

•

Jun 8, 2025

• 1

published an article over 1 year ago

Article

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

NormalUhr

•

Feb 28, 2025

• 19

published an article over 1 year ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

NormalUhr

•

Feb 11, 2025

• 126

published an article over 1 year ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

NormalUhr

•

Feb 7, 2025

• 295

published an article over 1 year ago

Article

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

NormalUhr

•

Feb 4, 2025

• 36

published an article over 1 year ago

Article

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

NormalUhr

•

Feb 4, 2025

• 17

published an article over 1 year ago

Article

MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression

NormalUhr

•

Feb 4, 2025

• 23

Yihua Zhang

AI & ML interests

Organizations

A Role Shift for AI Infra: From Foundational Support to a Core Engine of Innovation

Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”

From GRPO to DAPO and GSPO: What, Why, and How

Decorators in Machine Learning

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression

Yihua Zhang

AI & ML interests

Organizations

NormalUhr's activity

A Role Shift for AI Infra: From Foundational Support to a Core Engine of Innovation

Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”

From GRPO to DAPO and GSPO: What, Why, and How

Decorators in Machine Learning

DualPipe Explained: A Comprehensive Guide to DualPipe That Anyone Can Understand—Even Without a Distributed Training Background

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression