Instructions to use mingyue0101/codellama-7b-matplotlib-assistant with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use mingyue0101/codellama-7b-matplotlib-assistant with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("codellama/CodeLlama-7b-Instruct-hf") model = PeftModel.from_pretrained(base_model, "mingyue0101/codellama-7b-matplotlib-assistant") - Notebooks
- Google Colab
- Kaggle
Model Card for codellama-7b-matplotlib-assistant
This model is a fine-tuned version of codellama/CodeLlama-7b-Instruct-hf designed to enhance instruction-following capabilities. It was developed as part of a Master's thesis project.
Model Details
Model Description
The codellama-7b-matplotlib-assistant model is a large language model fine-tuned using the QLoRA (4-bit Quantization + LoRA) technique. The goal of this model was to adapt the base CodeLlama model to better follow user instructions while maintaining its coding and reasoning capabilities.
- Developed by: mingyue0101
- Model type: Causal Language Model (Fine-tuned with PEFT/LoRA)
- Language(s) (NLP): English, Chinese
- License: Apache-2.0 (inherited from CodeLlama)
- Finetuned from model: codellama/CodeLlama-7b-Instruct-hf
Model Sources
- Repository: https://huggingface.co/mingyue0101/codellama-7b-matplotlib-assistant
- Dataset: https://huggingface.co/datasets/mingyue0101/prompt_code_parquet
Uses
Direct Use
The model can be used for text generation, code assistance, and general-purpose instruction following. It is particularly suited for tasks where a balance of technical coding knowledge and conversational instruction following is required.
Out-of-Scope Use
The model should not be used for high-stakes decision-making, generating malicious code, or any application that violates the safety guidelines of the base CodeLlama model.
Bias, Risks, and Limitations
This model may inherit biases present in the training data or the base model. Since it was fine-tuned on a specific dataset (parquet02), it might exhibit limitations when handling domains outside of its training distribution. Users should expect potential hallucinations in complex reasoning tasks.
Recommendations
Users are encouraged to use safety filters when deploying this model in production and to perform domain-specific evaluation before use.
How to Get Started with the Model
Use the code below to load the model in 4-bit precision:
import os
import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
pipeline,
logging,
)
from peft import LoraConfig
from trl import SFTTrainer
# ==========================================
# 1. Global Parameter Configuration
# ==========================================
base_model = "codeparrot/codeparrot" # Base model ID on Hugging Face
new_dataset = "mingyue0101/prompts_modi" # Fine-tuning dataset ID
new_model = "codeparrot_ming03" # Directory name for saving the fine-tuned model
# ==========================================
# 2. Dataset Loading
# ==========================================
dataset = load_dataset(new_dataset, split="train")
# ==========================================
# 3. QLoRA 4-bit Quantization Configuration
# ==========================================
compute_dtype = getattr(torch, "float16")
quant_config = BitsAndBytesConfig(
load_in_4bit=True, # Enable 4-bit quantization storage
bnb_4bit_quant_type="nf4", # Use NormalFloat4 for better precision than FP4
bnb_4bit_compute_dtype=compute_dtype, # Cast to Float16 during matrix multiplication
bnb_4bit_use_double_quant=False, # Disable double quantization
)
# ==========================================
# 4. Load Base Model with Optimizations
# ==========================================
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=quant_config,
device_map={"": 0} # Force load the model onto the first GPU (GPU 0)
)
model.config.use_cache = False # Must disable KV cache during training to avoid backprop conflicts
model.config.pretraining_tp = 1 # Set tensor parallelism to 1 for single-GPU training
# ==========================================
# 5. Tokenizer Configuration & Alignment
# ==========================================
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token # Causal LMs usually have no pad_token; reuse eos_token
tokenizer.padding_side = "right" # Pad on the right to maintain proper causal attention masks
# ==========================================
# 6. PEFT (Lora) Adapter Hyperparameters
# ==========================================
peft_params = LoraConfig(
r=64, # LoRA rank, controlling the number of trainable parameters
lora_alpha=16, # Scaling factor for LoRA weights
lora_dropout=0.1, # Dropout probability to prevent overfitting in the adapter
bias="none", # Do not train bias parameters
task_type="CAUSAL_LM", # Explicitly declare the task type as Causal LM
fan_in_fan_out="True"
)
# ==========================================
# 7. Training Arguments
# ==========================================
training_params = TrainingArguments(
output_dir="./results", # Output directory for checkpoints and logs
num_train_epochs=1, # Number of training epochs
per_device_train_batch_size=4, # Batch size per device during training
gradient_accumulation_steps=1, # Number of updates steps to accumulate gradients
optim="paged_adamw_32bit", # Use QLoRA paged optimizer to prevent Out-Of-Memory (OOM)
save_steps=25, # Save checkpoint every 25 steps
logging_steps=25, # Log training metrics every 25 steps
learning_rate=2e-4, # Initial learning rate
weight_decay=0.001, # Weight decay coefficient
fp16=False, # Disable standard fp16 (handled by the quantization kernel)
bf16=False,
max_grad_norm=0.3, # Max gradient norm for gradient clipping
max_steps=-1, # Rely on epochs instead of max_steps to control training length
warmup_ratio=0.03, # Linear warmup ratio over training steps
group_by_length=True, # Group sequences of similar lengths into batches to speed up training
lr_scheduler_type="constant", # Learning rate schedule type
report_to="tensorboard" # Use TensorBoard to log training progress
)
# ==========================================
# 8. Start Supervised Fine-Tuning (SFT) & Save
# ==========================================
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_params,
dataset_text_field="column0", # Name of the column containing text data in the dataset
max_seq_length=None, # Use default maximum sequence length
tokenizer=tokenizer,
args=training_params,
packing=False, # Disable sample packing (combining multiple examples into one sequence)
)
# Launch the training process
trainer.train()
# Save the trained LoRA adapter weights and tokenizer files
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)
print(f"Training complete! Finetuned weights successfully saved to: {new_model}")
Training Details
Training Data
The model was trained on the mingyue0101/parquet02 dataset. This dataset contains instruction-response pairs formatted for Supervised Fine-Tuning (SFT).
Training Procedure
Training Hyperparameters
- Training regime: QLoRA 4-bit (NF4) mixed precision (fp16)
- Learning rate: 2e-4
- Optimizer: paged_adamw_32bit
- Batch size: 4
- Epochs: 1
- LoRA Rank (r): 64
- LoRA Alpha: 16
- LoRA Dropout: 0.1
- LR Scheduler: constant
- Warmup Ratio: 0.03
Technical Specifications
Model Architecture and Objective
Based on the Llama 2 architecture, this model utilizes grouped-query attention (GQA) and rotary positional embeddings (RoPE), fine-tuned with a causal language modeling objective.
Compute Infrastructure
Software
- PEFT 0.10.0
- Transformers
- Bitsandbytes
- TRL (SFTTrainer)
- Downloads last month
- -
Model tree for mingyue0101/codellama-7b-matplotlib-assistant
Base model
codellama/CodeLlama-7b-Instruct-hf