File size: 2,826 Bytes
0298295 f61361a 0298295 f61361a 0298295 f61361a 29e18f4 0298295 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
license: mit
pipeline_tag: text-generation
---
<h1 align="center">
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
</h1>
<div align="center">
<a href="https://chenlong-clock.github.io">Charlie Zhang</a>, <a href="https://www.phontron.com">Graham Neubig</a>,
<a href="https://xiangyue9607.github.io">Xiang Yue</a>
Carnegie Mellon University, Language Technologies Institute
</div>
<div align="center">
[](https://arxiv.org/abs/2512.07783)
[](LICENSE)

</div>
## Does Reinforcement Learning Truly Extend Reasoning?
This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks.
## 🔍 Overview
Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study:
* **Extrapolative generalization** to more complex compositions (deeper dependency graphs).
* **Contextual generalization** across diverse surface forms and linguistic contexts.
* How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training.
## 🧠 Key findings
<div align="center">
<h1 align="center">
<img src="assets/findings.png" width="500" />
</h1>
</div>
You may also find the comic generated by Notebook LLM [here](assets/Interplay-LM-Reasoning.pdf).
## Code
The code and data for this work will be released soon at the following GitHub repository: [https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning)
## 📚 Citation
If you find this work or code useful, please consider citing:
```bibtex
@misc{zhang2025interplaypretrainingmidtrainingrl,
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
author={Charlie Zhang and Graham Neubig and Xiang Yue},
year={2025},
eprint={2512.07783},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.07783},
}
``` |