| | --- |
| | library_name: diffusers |
| | license: apache-2.0 |
| | pipeline_tag: image-to-video |
| | --- |
| | |
| | <h1 align="center"> |
| | RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer |
| | </h1> |
| |
|
| |
|
| | <div align="center" class="authors"> |
| | Liu Liu, |
| | Xiaofeng Wang, |
| | Guosheng Zhao, |
| | Keyu Li, |
| | Wenkang Qin, |
| | Jiaxiong Qiu, |
| | Zheng Zhu, |
| | Guan Huang, |
| | Zhizhong Su |
| | </div> |
| |
|
| | <div align="center" style="line-height: 3;"> |
| | <a href="https://github.com/HorizonRobotics/RoboTransfer" target="_blank" style="margin: 2px;"> |
| | <img alt="Code" src="https://img.shields.io/badge/Code-Github-blue" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | <a href="https://horizonrobotics.github.io/robot_lab/robotransfer" target="_blank" style="margin: 2px;"> |
| | <img alt="Project Page" src="https://img.shields.io/badge/π-Project_Page-blue" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | <a href="https://arxiv.org/abs/2505.23171" target="_blank" style="margin: 2px;"> |
| | <img alt="arXiv" src="https://img.shields.io/badge/π-arXiv-b31b1b" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | <a href="https://youtu.be/dGXKtqDnm5Q" target="_blank" style="margin: 2px;"> |
| | <img alt="Video" src="https://img.shields.io/badge/π₯-Video-red" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | <a href="https://mp.weixin.qq.com/s/c9-1HPBMHIy4oEwyKnsT7Q" target="_blank" style="margin: 2px;"> |
| | <img alt="δΈζδ»η»" src="https://img.shields.io/badge/δΈζδ»η»-07C160?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | </div> |
| | |
| | <div align="center"> |
| | <img src="assets/pin.jpg" width="40%" alt="RoboTransfer"/></div> |
| |
|
| | --- |
| |
|
| | ## π Abstract |
| |
|
| |  |
| |
|
| | **RoboTransfer** is a novel diffusion-based video generation framework tailored for robotic visual policy transfer. Unlike conventional approaches, RoboTransfer introduces **geometry-aware synthesis** by injecting **depth and normal priors**, ensuring multi-view consistency across dynamic robotic scenes. The method further supports **explicit control over scene components**, such as **background editing**, **object identity swapping**, and **motion specification**, offering a fine-grained video generation pipeline that benefits embodied learning. |
| |
|
| | --- |
| |
|
| | ## π§ Key Features |
| |
|
| | - π **Geometry-Consistent Diffusion**: Injects global 3D cues (depth, normal) and cross-view interactions for multi-view realism. |
| | - π§© **Scene Component Control**: Enables manipulation of object attributes (pose, identity) and background features. |
| | - π **Cross-View Conditioning**: Learns representations from multiple camera views with spatial correspondence. |
| | - π€ **Robotic Policy Transfer**: Facilitates domain adaptation by generating synthetic training data in target domains. |
| |
|
| | --- |
| |
|
| | ## π BibTeX |
| |
|
| | ```bibtex |
| | @article{liu2025robotransfer, |
| | title={RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer}, |
| | author={Liu, Liu and Wang, Xiaofeng and Zhao, Guosheng and Li, Keyu Li, Wenkang Qin, Jiaxiong Qiu, Zheng Zhu, Guan Huang, Zhizhong Su}, |
| | journal={arXiv preprint arXiv:2505.23171}, |
| | year={2025} |
| | } |
| | ``` |