Model Card for Gemma-3-1B-GRPO-Vi-Medical-LoRA
This model is a fine-tuned version of google/gemma-3-1b-it, optimized with 4-bit quantization for efficient memory usage and high performance. It supports a maximum sequence length of 2048 tokens for extended context processing. The model was fine-tuned using TRL without full parameter updates, ensuring resource-efficient training. Alternatively, it can leverage the optimizations of unsloth/gemma-3-1b-it-unsloth-bnb-4bit or unsloth/gemma-3-1b-it with 4-bit loading for enhanced efficiency in memory-constrained environments while maintaining robust language capabilities.
Training procedure
This model was trained with GRPO, a method introduced in DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.
Usage
HuggingFace Authentication
import os
from huggingface_hub import login
# Set the Hugging Face API token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>"
# # Initialize API
login(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))
Inference
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
from transformers import TextStreamer
device = "cuda" if torch.cuda.is_available() else "cpu"
# Define model and LoRA adapter paths
base_model_name = "google/gemma-3-1b-it"
lora_adapter_name = "danhtran2mind/Gemma-3-1B-GRPO-Vi-Medical-LoRA"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Load base model with optimized settings
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16, # Use FP16 for efficiency
device_map=device,
trust_remote_code=True
)
# Apply LoRA adapter
model = PeftModel.from_pretrained(model, lora_adapter_name)
# Set model to evaluation mode
model.eval()
SYSTEM_PROMPT = """
Trả lời theo định dạng sau đây:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
# Define the question
question = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào "
"tại bệnh viện để thăm khám?")
seed = 42
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": question},
]
text = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False # Ensure the output is a string
)
# Tokenize the input and move to device
inputs = tokenizer(text, return_tensors="pt").to(device)
# Generate response with TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
_ = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.7,
top_p=0.95,
top_k=64,
streamer=streamer
)
<reasoning>
Việc nghi ngờ bị loét dạ dày tá tràng cần được thăm khám bởi bác sĩ chuyên khoa tiêu hóa để chẩn đoán chính xác và có phương pháp điều trị phù hợp. Việc tự ý đi khám có thể gây ra những hậu quả nghiêm trọng cho sức khỏe.
</reasoning>
<answer>
Bạn nên đến khoa tiêu hóa của bệnh viện để thăm khám. Bác sĩ sẽ khám và hỏi bệnh sử, thực hiện các xét nghiệm cần thiết để xác định nguyên nhân gây loét dạ dày tá tràng và đưa ra phương pháp điều trị phù hợp.
</answer>
Libraries version
- PEFT 0.15.2
- TRL: 0.19.0
- Transformers: 4.52.4
- Pytorch: 2.6.0+cu124
- Datasets: 3.6.0
- Tokenizers: 0.21.2
Citations
Cite GRPO as:
@article{zhihong2024deepseekmath,
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year = 2024,
eprint = {arXiv:2402.03300},
}
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for danhtran2mind/Gemma-3-1B-GRPO-Vi-Medical-LoRA
Dataset used to train danhtran2mind/Gemma-3-1B-GRPO-Vi-Medical-LoRA
Space using danhtran2mind/Gemma-3-1B-GRPO-Vi-Medical-LoRA 1
Collection including danhtran2mind/Gemma-3-1B-GRPO-Vi-Medical-LoRA
Collection
DanhTran2Mind's fine-tuned LLMs use LoRA for efficiency or full fine-tuning for top performance, customized to each model hub and task.
•
8 items
•
Updated