Fine-tuning specs:

training_params = SFTConfig(
    output_dir="checkpoints",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,
    num_train_epochs=3,
    learning_rate=1e-4, # lowered from 2e-4 to 1e-4
    weight_decay=0.001,
    dataset_text_field="text",
    report_to="none",
    bf16=False,
    fp16=False,
    dataloader_pin_memory=False,
    remove_unused_columns=False,
    max_length=512,
    gradient_checkpointing=True,
    dataloader_num_workers=0,
    save_strategy="epoch",
    logging_steps=100,
    average_tokens_across_devices=False  # Fix for single device training
    # Remove loss_type parameter to avoid the warning
    # The trainer will automatically use ForCausalLMLoss which is correct
)

# Configure model for gradient checkpointing compatibility
model.config.use_cache = False

trainer = SFTTrainer(
    model=model,
    train_dataset=ds['train'],
    processing_class=tokenizer,
    args=training_params
)

Training outputs

TrainOutput(
global_step=16773,
training_loss=2.056998251788356,
metrics={
  'train_runtime': 3255.1858,
  'train_samples_per_second': 10.305,
  'train_steps_per_second': 5.153,
  'total_flos': 164188359936000.0,
  'train_loss': 2.056998251788356})
Downloads last month
11
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gofilipa/LoveIsBlind_Postpods

Finetuned
(2085)
this model