-
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Paper • 2504.20752 • Published • 94 -
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Paper • 2504.21233 • Published • 49 -
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
Collections
Discover the best community collections!
Collections including paper arxiv:2602.05400
-
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 51 -
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 314 -
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Paper • 2101.00027 • Published • 9
-
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper • 2601.16443 • Published • 16 -
Linear representations in language models can change dramatically over a conversation
Paper • 2601.20834 • Published • 21 -
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 99 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 40
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 151 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23
-
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 53 -
Data Efficacy for Language Model Training
Paper • 2506.21545 • Published • 11 -
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Paper • 2507.04009 • Published • 53 -
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs
Paper • 2507.03253 • Published • 19
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 20 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 51 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 97 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 64 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 121 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 45
-
Modifying Large Language Model Post-Training for Diverse Creative Writing
Paper • 2503.17126 • Published • 36 -
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 55 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 51 -
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
Paper • 2602.03139 • Published • 41
-
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Paper • 2504.20752 • Published • 94 -
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Paper • 2504.21233 • Published • 49 -
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
-
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 51 -
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
Paper • 2602.05400 • Published • 314 -
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Paper • 2101.00027 • Published • 9
-
Endless Terminals: Scaling RL Environments for Terminal Agents
Paper • 2601.16443 • Published • 16 -
Linear representations in language models can change dramatically over a conversation
Paper • 2601.20834 • Published • 21 -
Scaling Embeddings Outperforms Scaling Experts in Language Models
Paper • 2601.21204 • Published • 99 -
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
Paper • 2601.18778 • Published • 40
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 20 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 51 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 151 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 97 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 64 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 121 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 45
-
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 53 -
Data Efficacy for Language Model Training
Paper • 2506.21545 • Published • 11 -
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Paper • 2507.04009 • Published • 53 -
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs
Paper • 2507.03253 • Published • 19
-
Modifying Large Language Model Post-Training for Diverse Creative Writing
Paper • 2503.17126 • Published • 36 -
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 55 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 51 -
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis
Paper • 2602.03139 • Published • 41