LLM Lecture 2025 Advanced Competition (AgentBench: DBBench + ALFWorld)

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve multi-turn agent task performance on ALFWorld (household tasks).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn observation grounding, action selection, tool use, and recovery from errors.

Training data used in this run is ALFWorld only (see datasets in YAML). Evaluation in the competition includes AgentBench tasks (DBBench + ALFWorld) by the organizers.

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (PEFT)
Max sequence length: 2048
Epochs: 2
Learning rate: 1.5e-6
LoRA: r=64, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "KOUJI039/structeval-qwen3-4b-sft-try20"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data:

u-10bei/sft_alfworld_trajectory_dataset_v5

This repository does NOT redistribute the dataset. Users must comply with the dataset license and base model terms.

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for KOUJI039/structeval-qwen3-4b-sft-try20

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5259)

this model

KOUJI039
/

structeval-qwen3-4b-sft-try20