LLM Lecture 2025 Advanced Competition (AgentBench: DBBench + ALFWorld)

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve multi-turn agent task performance on ALFWorld (household tasks).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn observation grounding, action selection, tool use, and recovery from errors.

Training data used in this run is ALFWorld only (see datasets in YAML). Evaluation in the competition includes AgentBench tasks (DBBench + ALFWorld) by the organizers.

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (PEFT)
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 1.5e-6
  • LoRA: r=64, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "KOUJI039/structeval-qwen3-4b-sft-try20"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data:

  • u-10bei/sft_alfworld_trajectory_dataset_v5

This repository does NOT redistribute the dataset. Users must comply with the dataset license and base model terms.

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KOUJI039/structeval-qwen3-4b-sft-try20

Adapter
(5259)
this model

Dataset used to train KOUJI039/structeval-qwen3-4b-sft-try20