SmolLM2-360M Alpaca LoRA

This repository contains a LoRA adapter fine-tuned from the base model HuggingFaceTB/SmolLM2-360M-Instruct on the yahma/alpaca-cleaned instruction dataset. The goal is to create a small, fast model that can answer simple questions and follow instructions in a chat-like format without needing an external context paragraph.

Model Details

  • Base model: HuggingFaceTB/SmolLM2-360M-Instruct
  • Architecture: Decoder-only transformer (causal language model)
  • Fine-tuning method: LoRA (PEFT)
  • Parameter count (base): ~360M
  • Trainable parameters (LoRA): ~1โ€“2% of total
  • Author: Nada Ashraf
  • License: Apache-2.0

This repository only contains the LoRA adapter weights and tokenizer files, not the full 360M base model. To use the model you must load the base model and then attach this adapter.

Intended Use

  • Educational experiments with parameter-efficient fine-tuning.
  • Simple question answering and instruction following.
  • Demonstrations of SmolLM2 fine-tuning in Google Colab.

This model is not intended for production use or for safety-critical applications.

Training Data

  • Dataset: yahma/alpaca-cleaned
  • Language: Primarily English
  • Data types: Short instructions and responses in an Alpaca-style format (instruction, optional input, output).
  • Train subset used: 10,000 examples
  • Validation subset used: 1,000 examples

The dataset contains a wide range of generic instructions such as explanations, rewriting, summarization, and simple reasoning tasks.

Training Procedure

Fine-tuning was performed in Google Colab using PEFT LoRA on top of the frozen base model HuggingFaceTB/SmolLM2-360M-Instruct.

Key hyperparameters:

  • Max sequence length: 256 tokens
  • Train examples: 10,000
  • Validation examples: 1,000
  • Epochs: 1
  • Batch size: 4 (with gradient accumulation steps = 4)
  • Effective batch size: 16 examples per update
  • Learning rate: 2e-4
  • Optimizer: AdamW (via Hugging Face Trainer)
  • Precision: fp16
  • Hardware: Google Colab GPU

Only the LoRA adapter parameters were updated; the base model weights were kept frozen.

How to Use

The adapter should be loaded on top of the base SmolLM2-360M-Instruct model. Example usage:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

BASE_MODEL = "HuggingFaceTB/SmolLM2-360M-Instruct"
ADAPTER = "nadaashraff/smollm2-360m-alpaca-lora"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

def chat(prompt, max_new_tokens=128):
    messages = [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(chat("Explain overfitting in simple terms."))
Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nadaashraff/smollm2-360m-alpaca-lora

Adapter
(21)
this model

Space using nadaashraff/smollm2-360m-alpaca-lora 1