Diffusion LLM experiments
Collection
See more: https://t.me/buckwheat_thoughts/283
•
3 items
•
Updated
This is a discrete text diffusion model, adapted from jhu-clsp/ettin-encoder-1b.
It was trained on 760k sample SFT dataset, 50% of which was taken from allenai/tulu-3-sft-mixture and 50% from Vikhrmodels/GrandMaster-PRO-MAX datasets. Resulting performance on English is okay, in Russian it loops.
Training was carried out using my own fork of dllm repository. Command for training:
python -m accelerate.commands.launch --config_file scripts/accelerate_configs/zero2.yaml --num_processes 8 \
examples/bert/sft.py \
--model_name_or_path "jhu-clsp/ettin-encoder-1b" \
--dataset_args "allenai/tulu-3-sft-mixture|Vikhrmodels/GrandMaster-PRO-MAX" \
--max_length 1024 \
--num_train_epochs 10 \
--per_device_train_batch_size 12 \
--per_device_eval_batch_size 12 \
--save_steps 0.1 \
--output_dir "models/ettin-encoder-1b/tulu-3-grandmaster/epochs-10-bs-384-len-1024"
It is recommended to set a higher temperature to combat repetition issues. To run the model, use:
python examples/bert/chat.py --model_name_or_path path/to/checkpoint --temperature 1.0
Generation example:
[Question]
Who are you?
[/Question]
[Answer]
I am an AI digital assistant designed to assist with a wide range of tasks, including answering questions, providing information, and helping with a wide variety of problems. My purpose is to provide helpful and informative responses to users, engage in conversation with them, and assist with a wide range of topics. My design is centered around the principles of privacy, safety, and ethics. I am trained by researchers and engineers at the intersection of AI and ethics, and I am designed to follow guidelines that prioritize user safety, privacy, and ethics.
[/Answer]
[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]
Base model
jhu-clsp/ettin-encoder-1b