Update README.md
Browse files
README.md
CHANGED
|
@@ -66,6 +66,6 @@ The model utilizes Gemma-3 tokenizer — a SentencePiece tokenizer with a 262K v
|
|
| 66 |
# Training Information
|
| 67 |
The model was trained using the Megatron-LM framework on the LUMI HPC supercomputer. The training utilized 64 AMD MI250x nodes, totaling approximately 165000 GPU hours.
|
| 68 |
Intermediate Checkpoints
|
| 69 |
-
We have released intermediate checkpoints to provide access to the model's training progression. These checkpoints are available in separate branches, with a new checkpoint released every
|
| 70 |
|
| 71 |
The naming convention is `checkpoint_0xxxxx00`. For example, the checkpoint for 50000 iterations is named `checkpoint_0050000`. The available checkpoints range from `checkpoint_0010000` up to `checkpoint_0953675`. The final checkpoint, `checkpoint_0953675`, is located in the main branch.
|
|
|
|
| 66 |
# Training Information
|
| 67 |
The model was trained using the Megatron-LM framework on the LUMI HPC supercomputer. The training utilized 64 AMD MI250x nodes, totaling approximately 165000 GPU hours.
|
| 68 |
Intermediate Checkpoints
|
| 69 |
+
We have released intermediate checkpoints to provide access to the model's training progression. These checkpoints are available in separate branches, with a new checkpoint released every 10000 training steps.
|
| 70 |
|
| 71 |
The naming convention is `checkpoint_0xxxxx00`. For example, the checkpoint for 50000 iterations is named `checkpoint_0050000`. The available checkpoints range from `checkpoint_0010000` up to `checkpoint_0953675`. The final checkpoint, `checkpoint_0953675`, is located in the main branch.
|