metadata
library_name: diffusers
license: mit
pipeline_tag: text-to-video
base_model:
- Wan-AI/Wan2.2-T2V-A14B
Model Summary
This model is GRPO trained using UnifiedReward-Flex as reward on the training dataset of UniGenBench.
๐ The inference code is available at Github.
For further details, please refer to the following resources:
- ๐ฐ Paper: https://arxiv.org/abs/2602.02380
- ๐ช Project Page: https://codegoat24.github.io/UnifiedReward/flex
- ๐ค Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-flex
- ๐ค Dataset: https://huggingface.co/datasets/CodeGoat24/UnifiedReward-Flex-SFT-90K
- ๐ Point of Contact: Yibin Wang
More generated videos are shown in project page (bottom).
Citation
@article{unifiedreward-flex,
title={Unified Personalized Reward Model for Vision Generation},
author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2602.02380},
year={2026}
}

