LLaMA 2.7B Distilled from 7B LoRA

This is a 2.7B parameter LLaMA model distilled from a 7B LoRA-tuned teacher model using the MiniLLM knowledge distillation framework.

Model Details

Model Size: 2.7B parameters
Teacher Model: LLaMA-7B with LoRA fine-tuning on Dolly
Training Method: Knowledge Distillation (MiniLLM)
Dataset: Databricks Dolly 15k
Framework: MiniLLM

Key Features

Compact Size: 60% smaller than the 7B teacher model
Efficient Inference: Faster generation with reduced memory footprint
Knowledge Preservation: Maintains performance through distillation
Instruction Following: Fine-tuned for instruction-following tasks

Training Configuration

Distillation Method: MiniLLM with sequence-level KL divergence
Policy Exploration: Temperature 0.2
Response Scoring: 0.5
Max Length: 512 tokens
Batch Size: 4
Learning Rate: 5e-06
Gradient Accumulation: 4
Training Steps: 3000

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "YOUR_USERNAME/llama-2.7b-distilled-from-7b-lora",
    torch_dtype=torch.float16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/llama-2.7b-distilled-from-7b-lora")

# Generate response
prompt = "Instruction: Explain what is machine learning.\n\nResponse:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Performance Benefits

Size: ~2.7B parameters (vs 7B teacher)
Speed: ~2.5x faster inference
Memory: ~60% less GPU memory required
Quality: Maintains strong instruction-following capabilities

Limitations

May have reduced capabilities compared to full 7B model
Trained primarily on English instruction-following tasks
May generate biased or incorrect responses
Should be used responsibly with appropriate safety measures

Training Details

This model was trained using the MiniLLM framework with:

Distillation Loss: Sequence-level KL divergence
Policy Exploration: 4 samples per prompt
Response Scoring: Length-normalized scoring with ratio 0.5
Optimizer: AdamW
Learning Rate Schedule: Cosine decay

Citation

If you use this model, please cite the MiniLLM paper:

@inproceedings{minillm,
  title={MiniLLM: Knowledge Distillation of Large Language Models},
  author={Gu, Yuxian and Dong, Li and Wei, Furu and Huang, Minlie},
  booktitle={Proceedings of ICLR},
  year={2024}
}

License

This model is released under Apache 2.0 license. Note that LLaMA models have specific usage terms from Meta.

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for minhchuxuan/llama-2.7b-distill-7b-lora

Base model

meta-llama/Llama-2-7b-hf

Finetuned

(1086)

this model

minhchuxuan
/

llama-2.7b-distill-7b-lora