Model Description

Qwen/Qwen2.5-Coder-1.5B-Instruct์„ ๊ธฐ๋ฐ˜์œผ๋กœ PEFT๋ฅผ ์ด์šฉํ•˜์—ฌ QLoRA (4-bit quantization + PEFT)ํ•ด๋ณธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” beomi/KoAlpaca-RealQA๋ฅผ ์‚ฌ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ž‘์€ ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ QLoRA๋ฅผ ํ•œ ๊ฒƒ์ด๋‹ค ๋ณด๋‹ˆ ์–‘์งˆ์˜ output์ด ๋‚˜์˜ค์ง€๋Š” ์•Š์ง€๋งŒ QLoRA๋ชจ๋ธ๊ณผ ์›๋ณธ๋ชจ๋ธ์˜ ๋‹ต๋ณ€์ด ์ฐจ์ด๋Š” ํ™•์‹คํžˆ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Quantization Configuration

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

LoRA Condifiguration

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["c_attn", "q_proj", "v_proj"]
)

Training Arguments

training_args = TrainingArguments(
    num_train_epochs=8,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    evaluation_strategy="steps",
    eval_steps=300,
    save_strategy="steps",
    save_steps=300,
    logging_steps=300,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False
)

Training Progress

Step Training Loss Validation Loss
300 1.595000 1.611501
600 1.593300 1.596210
900 1.577600 1.586121
1200 1.564600 1.577804
... ... ...
7200 1.499700 1.525933
7500 1.493400 1.525612
7800 1.491000 1.525330
8100 1.499900 1.525138

์‹คํ–‰ ์ฝ”๋“œ

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

# Quantization config (must match QLoRA settings used during fine-tuning)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

# Load tokenizer and model (local or hub path)
model_path = "onebeans/Qwen2.5-Coder-KoInstruct-QLoRA"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    quantization_config=bnb_config,
    device_map="auto"
)
model.eval()

# Define prompt using ChatML format (Qwen-style)
def build_chatml_prompt(question: str) -> str:
    system_msg = "<|im_start|>system\n๋‹น์‹ ์€ ์œ ์šฉํ•œ ํ•œ๊ตญ์–ด ๋„์šฐ๋ฏธ์ž…๋‹ˆ๋‹ค.<|im_end|>\n"
    user_msg = f"<|im_start|>user\n{question}<|im_end|>\n"
    return system_msg + user_msg + "<|im_start|>assistant\n"

# Run inference
def generate_response(question: str, max_new_tokens: int = 128) -> str:
    prompt = build_chatml_prompt(question)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            top_p=0.9,
            temperature=0.7,
            eos_token_id=tokenizer.eos_token_id,
        )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example
question = "ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?" # ๊ธฐ์กด ๋ชจ๋ธ(Qwen/Qwen2.5-Coder-1.5B-Instruct)์˜ ์‘๋‹ต -> ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์„œ์šธ์ž…๋‹ˆ๋‹ค.
response = generate_response(question)
print("๋ชจ๋ธ ์‘๋‹ต:\n", response)

์‹คํ–‰ํ™˜๊ฒฝ

Window 10

NVIDIA GeForce RTX 4070 Ti

Framework Versions

Python: 3.10.14

PyTorch: 1.12.1

Transformers: 4.46.2

Datasets: 3.2.0

Tokenizers: 0.20.3

PEFT: 0.8.2

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for onebeans/Qwen2.5-Coder-KoInstruct-QLoRA

Finetuned
(155)
this model

Dataset used to train onebeans/Qwen2.5-Coder-KoInstruct-QLoRA