RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models
Abstract
Reinforcement learning-based sim-real co-training framework improves vision-language-action policy performance through interactive simulation and real-world data anchoring.
Simulation offers a scalable and low-cost way to enrich vision-language-action (VLA) training, reducing reliance on expensive real-robot demonstrations. However, most sim-real co-training methods rely on supervised fine-tuning (SFT), which treats simulation as a static source of demonstrations and does not exploit large-scale closed-loop interaction. Consequently, real-world gains and generalization are often limited. In this paper, we propose an \textit{RL}-based sim-real \textit{Co}-training (RL-Co) framework that leverages interactive simulation while preserving real-world capabilities. Our method follows a generic two-stage design: we first warm-start the policy with SFT on a mixture of real and simulated demonstrations, then fine-tune it with reinforcement learning in simulation while adding an auxiliary supervised loss on real-world data to anchor the policy and mitigate catastrophic forgetting. We evaluate our framework on four real-world tabletop manipulation tasks using two representative VLA architectures, OpenVLA and π_{0.5}, and observe consistent improvements over real-only fine-tuning and SFT-based co-training, including +24% real-world success on OpenVLA and +20% on π_{0.5}. Beyond higher success rates, RL co-training yields stronger generalization to unseen task variations and substantially improved real-world data efficiency, providing a practical and scalable pathway for leveraging simulation to enhance real-robot deployment.
Community
In this paper, we propose an RL-based sim-real Co-training (RL-Co) framework that leverages interactive simulation while preserving real-world capabilities.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation (2026)
- Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning (2026)
- World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy (2026)
- VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model (2026)
- On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning (2026)
- Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation (2026)
- World-Gymnast: Training Robots with Reinforcement Learning in a World Model (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper