Fine-tuning specs:
training_params = SFTConfig(
output_dir="checkpoints",
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
gradient_accumulation_steps=2,
num_train_epochs=3,
learning_rate=1e-4, # lowered from 2e-4 to 1e-4
weight_decay=0.001,
dataset_text_field="text",
report_to="none",
bf16=False,
fp16=False,
dataloader_pin_memory=False,
remove_unused_columns=False,
max_length=512,
gradient_checkpointing=True,
dataloader_num_workers=0,
save_strategy="epoch",
logging_steps=100,
average_tokens_across_devices=False # Fix for single device training
# Remove loss_type parameter to avoid the warning
# The trainer will automatically use ForCausalLMLoss which is correct
)
# Configure model for gradient checkpointing compatibility
model.config.use_cache = False
trainer = SFTTrainer(
model=model,
train_dataset=ds['train'],
processing_class=tokenizer,
args=training_params
)
Training outputs
TrainOutput(
global_step=16773,
training_loss=2.056998251788356,
metrics={
'train_runtime': 3255.1858,
'train_samples_per_second': 10.305,
'train_steps_per_second': 5.153,
'total_flos': 164188359936000.0,
'train_loss': 2.056998251788356})
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for gofilipa/LoveIsBlind_Postpods
Base model
openai-community/gpt2