Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2602.05400

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published Apr 29, 2025 • 94
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30, 2025 • 49
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model

Paper • 2211.11363 • Published Nov 21, 2022 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29, 2024 • 51
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 12 days ago • 314
The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Paper • 2101.00027 • Published Dec 31, 2020 • 9

about 4 hours ago

Endless Terminals: Scaling RL Environments for Terminal Agents

Paper • 2601.16443 • Published 25 days ago • 16
Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published 19 days ago • 21
Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published 19 days ago • 99
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published 21 days ago • 40

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 151
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 71
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 20 days ago • 23

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Paper • 2506.19290 • Published Jun 24, 2025 • 53
Data Efficacy for Language Model Training

Paper • 2506.21545 • Published Jun 26, 2025 • 11
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 53
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

Paper • 2507.03253 • Published Jul 4, 2025 • 19

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 12 days ago • 314

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

Paper • 2602.06855 • Published 10 days ago • 70
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 12 days ago • 314

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published 26 days ago • 20
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published 25 days ago • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published 25 days ago • 51
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

Interested paper

DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published Nov 12, 2025 • 97
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Paper • 2511.19365 • Published Nov 24, 2025 • 64
Latent Collaboration in Multi-Agent Systems

Paper • 2511.20639 • Published Nov 25, 2025 • 121
Video Generation Models Are Good Latent Reward Models

Paper • 2511.21541 • Published Nov 26, 2025 • 45

Modifying Large Language Model Post-Training for Diverse Creative Writing

Paper • 2503.17126 • Published Mar 21, 2025 • 36
Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published Dec 2, 2025 • 55
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published 25 days ago • 51
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Paper • 2602.03139 • Published 14 days ago • 41

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published Apr 29, 2025 • 94
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30, 2025 • 49
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model

Paper • 2211.11363 • Published Nov 21, 2022 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 12 days ago • 314

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29, 2024 • 51
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 12 days ago • 314
The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Paper • 2101.00027 • Published Dec 31, 2020 • 9

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

Paper • 2602.06855 • Published 10 days ago • 70
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 12 days ago • 314

about 4 hours ago

Endless Terminals: Scaling RL Environments for Terminal Agents

Paper • 2601.16443 • Published 25 days ago • 16
Linear representations in language models can change dramatically over a conversation

Paper • 2601.20834 • Published 19 days ago • 21
Scaling Embeddings Outperforms Scaling Experts in Language Models

Paper • 2601.21204 • Published 19 days ago • 99
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

Paper • 2601.18778 • Published 21 days ago • 40

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published 26 days ago • 20
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published 25 days ago • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published 25 days ago • 51
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 151
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 71
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 20 days ago • 23

Interested paper

DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published Nov 12, 2025 • 97
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Paper • 2511.19365 • Published Nov 24, 2025 • 64
Latent Collaboration in Multi-Agent Systems

Paper • 2511.20639 • Published Nov 25, 2025 • 121
Video Generation Models Are Good Latent Reward Models

Paper • 2511.21541 • Published Nov 26, 2025 • 45

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Paper • 2506.19290 • Published Jun 24, 2025 • 53
Data Efficacy for Language Model Training

Paper • 2506.21545 • Published Jun 26, 2025 • 11
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

Paper • 2507.04009 • Published Jul 5, 2025 • 53
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

Paper • 2507.03253 • Published Jul 4, 2025 • 19

Modifying Large Language Model Post-Training for Diverse Creative Writing

Paper • 2503.17126 • Published Mar 21, 2025 • 36
Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published Dec 2, 2025 • 55
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published 25 days ago • 51
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Paper • 2602.03139 • Published 14 days ago • 41

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs