aws-neuron
/

ChessLM_Qwen3_Trainium_2_AWS_Format

Text Generation

continuous-batching

Model card Files Files and versions

jburtoft commited on 21 days ago

Commit

1296456

·

verified ·

1 Parent(s): e158e00

Update README.md

Files changed (1) hide show

README.md +66 -3

README.md CHANGED Viewed

@@ -1,3 +1,66 @@
----
-license: apache-2.0
----

+---
+language:
+- en
+license: apache-2.0
+pipeline_tag: text-generation
+tags:
+- chess
+- neuron
+- aws-trainium
+- vllm
+- optimum-neuron
+- continuous-batching
+base_model: karanps/ChessLM_Qwen3
+---
+# ChessLM Qwen3 - Neuron Traced (AWS Format Structure)
+This is a Neuron-traced version of [karanps/ChessLM_Qwen3](https://huggingface.co/karanps/ChessLM_Qwen3) optimized for AWS Trainium (trn2) instances using vLLM.
+This model follows the AWS Neuron repository structure with separate directories for compiled artifacts.
+This model is meant to be used from within the Neuron Workshop (https://github.com/aws-neuron/neuron-workshops)
+## Model Details
+- **Base Model**: Qwen3-8B fine-tuned for chess
+- **Compilation**: optimum-neuron[vllm]==0.3.0
+- **Compiler Version**: neuronxcc 2.21.33363.0
+- **Target Hardware**: AWS Trainium2 (trn2)
+- **Precision**: BF16
+- **Tensor Parallelism**: 2 cores
+- **Batch Size**: 4 (continuous batching enabled)
+- **Max Sequence Length**: 2048
+## Compilation instructions
+```
+optimum-cli export neuron \
+  --model karanps/ChessLM_Qwen3 \
+  --task text-generation \
+  --sequence_length 2048 \
+  --batch_size 4 \
+  /home/ubuntu/environment/ml/qwen-chess/karanps/ChessLM_Qwen3_compiled
+```
+### Key Files
+- **context_encoding_model/**: Compiled NEFF files for processing initial prompt sequences (up to 2048 tokens)
+- **token_generation_model/**: Compiled NEFF files for autoregressive token generation
+- **layout_opt/**: Layout optimization artifacts from compilation
+- **model.pt**: Main model file containing compiled graphs and embedded weights (17GB)
+- **neuron_config.json**: Neuron compilation configuration
+## Model Files
+| File | Purpose |
+|------|---------|
+| model.pt | Main model with embedded weights (17GB) |
+| config.json | Base model configuration |
+| neuron_config.json | Neuron compilation settings |
+| tokenizer* | Tokenizer files for text processing |
+## License
+This model inherits the license from the base model [karanps/ChessLM_Qwen3](https://huggingface.co/karanps/ChessLM_Qwen3).