LLM Lecture 2025 Advanced Competition (AgentBench: DBBench + ALFWorld)
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth.
This repository contains LoRA adapter weights only. The base model must be loaded separately.
Training Objective
This adapter is trained to improve multi-turn agent task performance on ALFWorld (household tasks).
Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn observation grounding, action selection, tool use, and recovery from errors.
Training data used in this run is ALFWorld only (see datasets in YAML).
Evaluation in the competition includes AgentBench tasks (DBBench + ALFWorld) by the organizers.
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (PEFT)
- Max sequence length: 2048
- Epochs: 2
- Learning rate: 1.5e-6
- LoRA: r=64, alpha=128
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "KOUJI039/structeval-qwen3-4b-sft-try20"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Sources & Terms (IMPORTANT)
Training data:
- u-10bei/sft_alfworld_trajectory_dataset_v5
This repository does NOT redistribute the dataset. Users must comply with the dataset license and base model terms.
- Downloads last month
- -
Model tree for KOUJI039/structeval-qwen3-4b-sft-try20
Base model
Qwen/Qwen3-4B-Instruct-2507