SmolLM2-360M Alpaca LoRA
This repository contains a LoRA adapter fine-tuned from the base model
HuggingFaceTB/SmolLM2-360M-Instruct on the yahma/alpaca-cleaned
instruction dataset. The goal is to create a small, fast model that can
answer simple questions and follow instructions in a chat-like format
without needing an external context paragraph.
Model Details
- Base model:
HuggingFaceTB/SmolLM2-360M-Instruct - Architecture: Decoder-only transformer (causal language model)
- Fine-tuning method: LoRA (PEFT)
- Parameter count (base): ~360M
- Trainable parameters (LoRA): ~1โ2% of total
- Author: Nada Ashraf
- License: Apache-2.0
This repository only contains the LoRA adapter weights and tokenizer files, not the full 360M base model. To use the model you must load the base model and then attach this adapter.
Intended Use
- Educational experiments with parameter-efficient fine-tuning.
- Simple question answering and instruction following.
- Demonstrations of SmolLM2 fine-tuning in Google Colab.
This model is not intended for production use or for safety-critical applications.
Training Data
- Dataset:
yahma/alpaca-cleaned - Language: Primarily English
- Data types: Short instructions and responses in an Alpaca-style
format (
instruction, optionalinput,output). - Train subset used: 10,000 examples
- Validation subset used: 1,000 examples
The dataset contains a wide range of generic instructions such as explanations, rewriting, summarization, and simple reasoning tasks.
Training Procedure
Fine-tuning was performed in Google Colab using PEFT LoRA on top of the
frozen base model HuggingFaceTB/SmolLM2-360M-Instruct.
Key hyperparameters:
- Max sequence length: 256 tokens
- Train examples: 10,000
- Validation examples: 1,000
- Epochs: 1
- Batch size: 4 (with gradient accumulation steps = 4)
- Effective batch size: 16 examples per update
- Learning rate: 2e-4
- Optimizer: AdamW (via Hugging Face
Trainer) - Precision: fp16
- Hardware: Google Colab GPU
Only the LoRA adapter parameters were updated; the base model weights were kept frozen.
How to Use
The adapter should be loaded on top of the base SmolLM2-360M-Instruct model. Example usage:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
BASE_MODEL = "HuggingFaceTB/SmolLM2-360M-Instruct"
ADAPTER = "nadaashraff/smollm2-360m-alpaca-lora"
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
def chat(prompt, max_new_tokens=128):
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
print(chat("Explain overfitting in simple terms."))
- Downloads last month
- 42
Model tree for nadaashraff/smollm2-360m-alpaca-lora
Base model
HuggingFaceTB/SmolLM2-360M