RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space Paper • 2606.14700 • Published 6 days ago • 11
RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space Paper • 2606.14700 • Published 6 days ago • 11
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30, 2025 • 44
MetaQuery Instruction Tuning Data Collection We downsample high-resolution images so that the shorter side is 1024 pixels (MetaQuery_Instruct_2.4M) or 512 pixels (MetaQuery_Instruct_2.4M_512res) • 2 items • Updated Jun 24, 2025 • 1
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Paper • 2505.10046 • Published May 15, 2025 • 9
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop Paper • 2503.09595 • Published Mar 12, 2025
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 100