CinnabarLM Python
CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based!
Why?
Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!
Differences from Preview
- Now it's Llama-based, Preview was a custom model
- And of course, it's stable now (it doesn't generate gibberish / mess of words anymore)!
Model Configurations
| Parameter | Value |
|---|---|
| Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) |
| Vocabulary Size | 4096 tokens |
| Batch Size | 4 x 8 = 32 |
| Context Window | Maybe 2048 tokens |
hidden_size |
192 |
intermediate_size |
192 |
num_hidden_layers |
6 |
num_attention_heads |
6 |
max_position_embeddings |
2048 |
rms_norm_eps |
1e-5 |
initializer_range |
0.02 |
use_cache |
True |
tie_word_embeddings |
False |
rope_theta |
10000.0 |
Training Configurations
| Hyperparameter | Value |
|---|---|
output_dir |
"./cinnabarlm-v2" |
max_steps |
10000 |
per_device_train_batch_size |
8 |
gradient_accumulation_steps |
4 |
learning_rate |
6e-4 |
weight_decay |
0.01 |
warmup_steps |
500 |
lr_scheduler_type |
"cosine" |
logging_steps |
100 |
save_steps |
2000 |
fp16 |
True |
save_total_limit |
2 |
prediction_loss_only |
True |
logging_first_step |
True |
Limitations
- Not Instruction-Tuned: It's only a base model, so it only completes text.
- Python-Only: It's trained on Python code (The Stack).
Some other details
- It's trained on ~70 million tokens of The Stack
- The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)
- Downloads last month
- 70