Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.08492

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 189 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

Training Research

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70
Don't Just Fine-tune the Agent, Tune the Environment

Paper • 2510.10197 • Published Oct 11, 2025 • 28
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9, 2025 • 125
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 270

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Representation Learning

End-to-End Vision Tokenizer Tuning

Paper • 2505.10562 • Published May 15, 2025 • 22
Global and Local Entailment Learning for Natural World Imagery

Paper • 2506.21476 • Published Jun 26, 2025 • 1
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 291
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published Sep 1, 2025 • 58

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 189 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Training Research

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70
Don't Just Fine-tune the Agent, Tune the Environment

Paper • 2510.10197 • Published Oct 11, 2025 • 28
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9, 2025 • 125
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 270

Representation Learning

End-to-End Vision Tokenizer Tuning

Paper • 2505.10562 • Published May 15, 2025 • 22
Global and Local Entailment Learning for Natural World Imagery

Paper • 2506.21476 • Published Jun 26, 2025 • 1
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 291
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic

Paper • 2509.01363 • Published Sep 1, 2025 • 58

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 90
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 34

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs