This is a discrete text diffusion model, adapted from jhu-clsp/ettin-encoder-1b.

It was trained on 760k sample SFT dataset, 50% of which was taken from allenai/tulu-3-sft-mixture and 50% from Vikhrmodels/GrandMaster-PRO-MAX datasets. Resulting performance on English is okay, in Russian it loops.

Training was carried out using my own fork of dllm repository. Command for training:

python -m accelerate.commands.launch --config_file scripts/accelerate_configs/zero2.yaml --num_processes 8 \
    examples/bert/sft.py \
    --model_name_or_path "jhu-clsp/ettin-encoder-1b" \
    --dataset_args "allenai/tulu-3-sft-mixture|Vikhrmodels/GrandMaster-PRO-MAX" \
    --max_length 1024 \
    --num_train_epochs 10 \
    --per_device_train_batch_size 12 \
    --per_device_eval_batch_size 12 \
    --save_steps 0.1 \
    --output_dir "models/ettin-encoder-1b/tulu-3-grandmaster/epochs-10-bs-384-len-1024"

It is recommended to set a higher temperature to combat repetition issues. To run the model, use:

python examples/bert/chat.py --model_name_or_path path/to/checkpoint --temperature 1.0

Generation example:

[Question]
Who are you?
[/Question]

[Answer]
I am an AI digital assistant designed to assist with a wide range of tasks, including answering questions, providing information, and helping with a wide variety of problems. My purpose is to provide helpful and informative responses to users, engage in conversation with them, and assist with a wide range of topics. My design is centered around the principles of privacy, safety, and ethics. I am trained by researchers and engineers at the intersection of AI and ethics, and I am designed to follow guidelines that prioritize user safety, privacy, and ethics.
[/Answer]

[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]