Model Card for Gemma-3-1B-GRPO-Vi-Medical-LoRA

This model is a fine-tuned version of google/gemma-3-1b-it, optimized with 4-bit quantization for efficient memory usage and high performance. It supports a maximum sequence length of 2048 tokens for extended context processing. The model was fine-tuned using TRL without full parameter updates, ensuring resource-efficient training. Alternatively, it can leverage the optimizations of unsloth/gemma-3-1b-it-unsloth-bnb-4bit or unsloth/gemma-3-1b-it with 4-bit loading for enhanced efficiency in memory-constrained environments while maintaining robust language capabilities.

Training procedure

This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

Usage

HuggingFace Authentication

import os
from huggingface_hub import login

# Set the Hugging Face API token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>"

# # Initialize API
login(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))

Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
from transformers import TextStreamer

device = "cuda" if torch.cuda.is_available() else "cpu"

# Define model and LoRA adapter paths
base_model_name = "google/gemma-3-1b-it"
lora_adapter_name = "danhtran2mind/Gemma-3-1B-GRPO-Vi-Medical-LoRA"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load base model with optimized settings
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,  # Use FP16 for efficiency
    device_map=device,
    trust_remote_code=True
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, lora_adapter_name)

# Set model to evaluation mode
model.eval()

SYSTEM_PROMPT = """
Trả lời theo định dạng sau đây:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

# Define the question
question = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào "
          "tại bệnh viện để thăm khám?")

seed = 42
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user",   "content": question},
]

text = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=False  # Ensure the output is a string
)

# Tokenize the input and move to device
inputs = tokenizer(text, return_tensors="pt").to(device)

# Generate response with TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
_ = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.95,
    top_k=64,
    streamer=streamer
)

<reasoning>
Việc nghi ngờ bị loét dạ dày tá tràng cần được thăm khám bởi bác sĩ chuyên khoa tiêu hóa để chẩn đoán chính xác và có phương pháp điều trị phù hợp. Việc tự ý đi khám có thể gây ra những hậu quả nghiêm trọng cho sức khỏe.
</reasoning>
<answer>
Bạn nên đến khoa tiêu hóa của bệnh viện để thăm khám. Bác sĩ sẽ khám và hỏi bệnh sử, thực hiện các xét nghiệm cần thiết để xác định nguyên nhân gây loét dạ dày tá tràng và đưa ra phương pháp điều trị phù hợp.
</answer>

Libraries version

PEFT 0.15.2
TRL: 0.19.0
Transformers: 4.52.4
Pytorch: 2.6.0+cu124
Datasets: 3.6.0
Tokenizers: 0.21.2

Citations

Cite GRPO as:

@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}