shellzero/gemma2-2b-ft-law-data-tag-generation
This model was converted to MLX format from google/gemma-7b-it.
Refer to the original model card for more details on the model.
pip install mlx-lm
The model was LoRA fine-tuned on the ymoslem/Law-StackExchange and Synthetic data generated from
GPT-4o and GPT-35-Turbo using the format below, for 1500 steps using mlx.
This fine tune was one of the best runs with our data and achieved high F1 score on our eval dataset. (Part of the Nvidia hackathon)
def format_prompt(system_prompt: str, title: str, question: str) -> str:
"Format the question to the format of the dataset we fine-tuned to."
return """<bos><start_of_turn>user
## Instructions
{}
## User
TITLE:
{}
QUESTION:
{}<end_of_turn>
<start_of_turn>model
""".format(
system_prompt, title, question
)
Here's an example of the system_prompt from the dataset:
Read the following title and question about a legal issue and assign the most appropriate tag to it. All tags must be in lowercase, ordered lexicographically and separated by commas.
Loading the model using mlx_lm
from mlx_lm import generate, load
model, tokenizer = load("shellzero/gemma2-2b-ft-law-data-tag-generation")
response = generate(
model,
tokenizer,
prompt=format_prompt(system_prompt, question),
verbose=True, # Set to True to see the prompt and response
temp=0.0,
max_tokens=32,
)
- Downloads last month
- 18
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for shellzero/gemma2-2b-ft-law-data-tag-generation
Base model
google/gemma-2-2b