SmolLM2-360M Alpaca LoRA

This repository contains a LoRA adapter fine-tuned from the base model HuggingFaceTB/SmolLM2-360M-Instruct on the yahma/alpaca-cleaned instruction dataset. The goal is to create a small, fast model that can answer simple questions and follow instructions in a chat-like format without needing an external context paragraph.

Model Details

Base model: HuggingFaceTB/SmolLM2-360M-Instruct
Architecture: Decoder-only transformer (causal language model)
Fine-tuning method: LoRA (PEFT)
Parameter count (base): ~360M
Trainable parameters (LoRA): ~1–2% of total
Author: Nada Ashraf
License: Apache-2.0

This repository only contains the LoRA adapter weights and tokenizer files, not the full 360M base model. To use the model you must load the base model and then attach this adapter.

Intended Use

Educational experiments with parameter-efficient fine-tuning.
Simple question answering and instruction following.
Demonstrations of SmolLM2 fine-tuning in Google Colab.

This model is not intended for production use or for safety-critical applications.

Training Data

Dataset: yahma/alpaca-cleaned
Language: Primarily English
Data types: Short instructions and responses in an Alpaca-style format (instruction, optional input, output).
Train subset used: 10,000 examples
Validation subset used: 1,000 examples

The dataset contains a wide range of generic instructions such as explanations, rewriting, summarization, and simple reasoning tasks.

Training Procedure

Fine-tuning was performed in Google Colab using PEFT LoRA on top of the frozen base model HuggingFaceTB/SmolLM2-360M-Instruct.

Key hyperparameters:

Max sequence length: 256 tokens
Train examples: 10,000
Validation examples: 1,000
Epochs: 1
Batch size: 4 (with gradient accumulation steps = 4)
Effective batch size: 16 examples per update
Learning rate: 2e-4
Optimizer: AdamW (via Hugging Face Trainer)
Precision: fp16
Hardware: Google Colab GPU

Only the LoRA adapter parameters were updated; the base model weights were kept frozen.

How to Use

The adapter should be loaded on top of the base SmolLM2-360M-Instruct model. Example usage:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

BASE_MODEL = "HuggingFaceTB/SmolLM2-360M-Instruct"
ADAPTER = "nadaashraff/smollm2-360m-alpaca-lora"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

def chat(prompt, max_new_tokens=128):
    messages = [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(chat("Explain overfitting in simple terms."))

Downloads last month: 42

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nadaashraff/smollm2-360m-alpaca-lora

Base model

HuggingFaceTB/SmolLM2-360M

Quantized

HuggingFaceTB/SmolLM2-360M-Instruct

Adapter

(21)

this model

nadaashraff
/

smollm2-360m-alpaca-lora