-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2404.07965
-
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54 -
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Paper • 2410.02749 • Published • 13 -
Fewer Truncations Improve Language Modeling
Paper • 2404.10830 • Published • 3 -
How to Train Long-Context Language Models (Effectively)
Paper • 2410.02660 • Published • 2
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 23 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 30
-
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Paper • 2210.14986 • Published • 5 -
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper • 2311.10702 • Published • 20 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 77 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 151 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 95 -
An accurate detection is not all you need to combat label noise in web-noisy datasets
Paper • 2407.05528 • Published • 4 -
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP
Paper • 2407.00402 • Published • 23
-
Associative Recurrent Memory Transformer
Paper • 2407.04841 • Published • 36 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 60 -
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper • 2405.21060 • Published • 67 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Paper • 2210.14986 • Published • 5 -
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper • 2311.10702 • Published • 20 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 77 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 28 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 43 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 14
-
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 151 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 95 -
An accurate detection is not all you need to combat label noise in web-noisy datasets
Paper • 2407.05528 • Published • 4 -
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP
Paper • 2407.00402 • Published • 23
-
DataComp-LM: In search of the next generation of training sets for language models
Paper • 2406.11794 • Published • 54 -
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Paper • 2410.02749 • Published • 13 -
Fewer Truncations Improve Language Modeling
Paper • 2404.10830 • Published • 3 -
How to Train Long-Context Language Models (Effectively)
Paper • 2410.02660 • Published • 2
-
Associative Recurrent Memory Transformer
Paper • 2407.04841 • Published • 36 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 60 -
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper • 2405.21060 • Published • 67 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 93 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 23 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 26 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 30