-
Unified Personalized Reward Model for Vision Generation
Paper • 2602.02380 • Published • 20 -
CodeGoat24/FLUX.2-klein-base-9B-UnifiedReward-Flex-lora
Text-to-Image • Updated • 321 • 15 -
CodeGoat24/Wan2.2-T2V-A14B-UnifiedReward-Flex-lora
Text-to-Video • Updated • 112 • 8 -
CodeGoat24/Wan2.1-T2V-14B-UnifiedReward-Flex-lora
Text-to-Video • Updated • 96 • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2602.02380
-
E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models
Paper • 2601.00423 • Published • 10 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 226 -
FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning
Paper • 2601.18150 • Published • 7 -
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Paper • 2601.20218 • Published • 15
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 79 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 73 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 37
-
Unified Personalized Reward Model for Vision Generation
Paper • 2602.02380 • Published • 20 -
CodeGoat24/FLUX.2-klein-base-9B-UnifiedReward-Flex-lora
Text-to-Image • Updated • 321 • 15 -
CodeGoat24/Wan2.2-T2V-A14B-UnifiedReward-Flex-lora
Text-to-Video • Updated • 112 • 8 -
CodeGoat24/Wan2.1-T2V-14B-UnifiedReward-Flex-lora
Text-to-Video • Updated • 96 • 6
-
E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models
Paper • 2601.00423 • Published • 10 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 226 -
FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning
Paper • 2601.18150 • Published • 7 -
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Paper • 2601.20218 • Published • 15
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 79 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 73 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 37
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75