Collections
Discover the best community collections!
Collections including paper arxiv:2401.02731
-
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
Instruction Tuning with Human Curriculum
Paper • 2310.09518 • Published • 3 -
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Paper • 2312.05934 • Published • 1 -
Language Models as Agent Models
Paper • 2212.01681 • Published
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 22 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
Transfer Learning for Text Diffusion Models
Paper • 2401.17181 • Published • 17
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper • 2308.12066 • Published • 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper • 2112.14397 • Published • 1
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper • 2310.13127 • Published • 12 -
Evaluating the Robustness to Instructions of Large Language Models
Paper • 2308.14306 • Published • 1
-
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Paper • 2401.02731 • Published • 3 -
hywu/Camelidae-8x7B
Text Generation • Updated • 1.71k • 15 -
hywu/Camelidae-8x13B
Text Generation • Updated • 1.72k • 5 -
hywu/Camelidae-8x34B
Text Generation • Updated • 1.88k • 29
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 50 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
Paper • 2401.02731 • Published • 3 -
hywu/Camelidae-8x7B
Text Generation • Updated • 1.71k • 15 -
hywu/Camelidae-8x13B
Text Generation • Updated • 1.72k • 5 -
hywu/Camelidae-8x34B
Text Generation • Updated • 1.88k • 29
-
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
Instruction Tuning with Human Curriculum
Paper • 2310.09518 • Published • 3 -
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Paper • 2312.05934 • Published • 1 -
Language Models as Agent Models
Paper • 2212.01681 • Published
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87 -
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 22 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 21 -
Transfer Learning for Text Diffusion Models
Paper • 2401.17181 • Published • 17
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 50 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper • 2308.12066 • Published • 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper • 2112.14397 • Published • 1
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 28 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 45 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4 -
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Paper • 2310.13127 • Published • 12 -
Evaluating the Robustness to Instructions of Large Language Models
Paper • 2308.14306 • Published • 1