LLaMA 2.7B Distilled from 7B LoRA

This is a 2.7B parameter LLaMA model distilled from a 7B LoRA-tuned teacher model using the MiniLLM knowledge distillation framework.

Model Details

  • Model Size: 2.7B parameters
  • Teacher Model: LLaMA-7B with LoRA fine-tuning on Dolly
  • Training Method: Knowledge Distillation (MiniLLM)
  • Dataset: Databricks Dolly 15k
  • Framework: MiniLLM

Key Features

  • Compact Size: 60% smaller than the 7B teacher model
  • Efficient Inference: Faster generation with reduced memory footprint
  • Knowledge Preservation: Maintains performance through distillation
  • Instruction Following: Fine-tuned for instruction-following tasks

Training Configuration

  • Distillation Method: MiniLLM with sequence-level KL divergence
  • Policy Exploration: Temperature 0.2
  • Response Scoring: 0.5
  • Max Length: 512 tokens
  • Batch Size: 4
  • Learning Rate: 5e-06
  • Gradient Accumulation: 4
  • Training Steps: 3000

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "YOUR_USERNAME/llama-2.7b-distilled-from-7b-lora",
    torch_dtype=torch.float16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/llama-2.7b-distilled-from-7b-lora")

# Generate response
prompt = "Instruction: Explain what is machine learning.\n\nResponse:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Performance Benefits

  • Size: ~2.7B parameters (vs 7B teacher)
  • Speed: ~2.5x faster inference
  • Memory: ~60% less GPU memory required
  • Quality: Maintains strong instruction-following capabilities

Limitations

  • May have reduced capabilities compared to full 7B model
  • Trained primarily on English instruction-following tasks
  • May generate biased or incorrect responses
  • Should be used responsibly with appropriate safety measures

Training Details

This model was trained using the MiniLLM framework with:

  • Distillation Loss: Sequence-level KL divergence
  • Policy Exploration: 4 samples per prompt
  • Response Scoring: Length-normalized scoring with ratio 0.5
  • Optimizer: AdamW
  • Learning Rate Schedule: Cosine decay

Citation

If you use this model, please cite the MiniLLM paper:

@inproceedings{minillm,
  title={MiniLLM: Knowledge Distillation of Large Language Models},
  author={Gu, Yuxian and Dong, Li and Wei, Furu and Huang, Minlie},
  booktitle={Proceedings of ICLR},
  year={2024}
}

License

This model is released under Apache 2.0 license. Note that LLaMA models have specific usage terms from Meta.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for minhchuxuan/llama-2.7b-distill-7b-lora

Finetuned
(1086)
this model

Dataset used to train minhchuxuan/llama-2.7b-distill-7b-lora