DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published 14 days ago • 34
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 7 days ago • 23
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published 2 days ago • 9
jina-embeddings-v5-text: Task-Targeted Embedding Distillation Paper • 2602.15547 • Published 3 days ago • 19
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published 8 days ago • 19
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 8 days ago • 46
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published 17 days ago • 57
view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 24 days ago • 138
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published Jan 6 • 154
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Paper • 2601.09575 • Published Jan 14 • 26
lovis93/next-scene-qwen-image-lora-2509 Image-to-Image • Updated Oct 21, 2025 • 36.4k • • 576
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 226
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published Jan 8 • 166
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published Jan 6 • 101
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 119