momeaicrypto commited on
Commit
1a2949e
·
verified ·
1 Parent(s): d1df890

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +121 -0
README.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ModernBERT-base-MoME-v0
2
+
3
+ This is a specialized variant of **ModernBERT-base** designed for *Mixture of Multichain Experts* (MoME) routing tasks, particularly focusing on determining which blockchain or chain expert (e.g., Aptos, Ripple, Polkadot, Crust) should handle an incoming transaction or query. It retains the core architectural and performance benefits of ModernBERT, while integrating custom training on chain classification data.
4
+
5
+ ---
6
+
7
+ ## Table of Contents
8
+ 1. [Model Summary](#model-summary)
9
+ 2. [Usage](#usage)
10
+ 3. [Evaluation](#evaluation)
11
+ 4. [Limitations](#limitations)
12
+ 5. [Training](#training)
13
+ 6. [License](#license)
14
+ 7. [Citation](#citation)
15
+
16
+ ---
17
+
18
+ ## Model Summary
19
+ **ModernBERT-base-MoME-v0** is an encoder-only model (BERT-style) derived from [ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base). The original ModernBERT was trained on a large corpus of text and code (2T tokens), supporting context lengths of up to 8,192 tokens. Key enhancements include:
20
+
21
+ - **Rotary Positional Embeddings (RoPE)** for long-context support
22
+ - **Local-Global Alternating Attention** for efficient attention over extended sequences
23
+ - **Unpadding + Flash Attention** for fast inference times
24
+
25
+ **ModernBERT-base-MoME-v0** extends these capabilities with a fine-tuned head specialized in routing transactions or queries to the correct “chain expert” in a Mixture of Experts (MoME) system. By integrating specialized training data for chain classification (e.g., Polkadot, Aptos, Ripple, Crust), the model can better determine which chain is relevant for a given transaction payload.
26
+
27
+ ---
28
+
29
+ ## Usage
30
+
31
+ You can load **ModernBERT-base-MoME-v0** using [Hugging Face Transformers](https://github.com/huggingface/transformers). The steps are largely identical to standard BERT usage, with two key notes:
32
+
33
+ 1. **Long-Context Support**
34
+ You can input sequences up to 8,192 tokens without degrading performance due to the model’s RoPE-based architecture.
35
+ 2. **Routing Head**
36
+ After the core BERT encoding, a classification head (or specialized projective layer) determines the most likely chain or domain.
37
+
38
+ ### Quickstart
39
+
40
+ ```python
41
+ pip install -U transformers>=4.48.0
42
+ pip install flash-attn # optional but recommended if supported by your GPU
43
+ ```
44
+
45
+ ```python
46
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
47
+
48
+ model_id = "momeaicrypto/ModernBERT-base-MoME-v0"
49
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
50
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
51
+
52
+ # Sample transaction or query
53
+ text = "Transaction: {\"action\": \"transfer\", \"chain\": \"polkadot\", ...}"
54
+
55
+ inputs = tokenizer(text, return_tensors="pt")
56
+ outputs = model(**inputs)
57
+
58
+ # The logits from outputs.logits will indicate which chain this transaction likely belongs to.
59
+ print("Logits:", outputs.logits)
60
+ predicted_label = outputs.logits.argmax(dim=-1).item()
61
+ print("Predicted chain ID:", predicted_label)
62
+ ```
63
+
64
+ **Note**: If you want to adapt the model to a different classification scheme (e.g., additional chains), you can fine-tune via standard BERT classification recipes.
65
+
66
+ ---
67
+
68
+ ## Evaluation
69
+ The base ModernBERT architecture has been shown to outperform or match other leading encoder-only models across GLUE, BEIR, MLDR, CodeSearchNet, and StackQA. For **ModernBERT-base-MoME-v0**, we specifically evaluate:
70
+
71
+ - **Chain Classification Accuracy**: Using a specialized dataset of transactions labeled by their respective chains (Polkadot, Aptos, Ripple, Crust, etc.).
72
+ - **Inference Efficiency on Long Inputs**: Verifying that the local-global alternating attention and Flash Attention enable high throughput, even for large transaction payloads or logs (up to 8,192 tokens).
73
+
74
+ See the parent ModernBERT evaluation results for a broad performance context:
75
+
76
+ | Model | IR (DPR) BEIR | IR (ColBERT) BEIR | NLU (GLUE) | Code (CSN) |
77
+ |-----------------|---------------|-------------------|-----------|-----------|
78
+ | BERT | 38.9 | 49.0 | 84.7 | 41.2 |
79
+ | RoBERTa | 37.7 | 48.7 | 86.4 | 44.3 |
80
+ | **ModernBERT** | 41.6 | 51.3 | 88.4 | 56.4 |
81
+
82
+ *ModernBERT-base-MoME-v0* maintains the same strong backbone while adding chain-routing capabilities.
83
+
84
+ ---
85
+
86
+ ## Limitations
87
+ 1. **Domain-Specific Training**: While it handles chain routing, performance may degrade if you feed it data outside of the pre-trained or fine-tuned domain (e.g., medical or legal text).
88
+ 2. **Biases**: As with any large language model, biases in the underlying dataset can manifest in certain classification outcomes.
89
+ 3. **Context Length**: Though it can handle sequences up to 8,192 tokens, keep in mind that very long sequences can be slower on certain GPU hardware if Flash Attention is not installed.
90
+
91
+ ---
92
+
93
+ ## Training
94
+ - **Base Model**: ModernBERT-base (149M parameters, 22 layers).
95
+ - **Fine-Tuning**: Additional training on ~1k chain-labeled transactions, focusing on Polkadot, Aptos, Ripple, Crust, etc.
96
+ - **Long Context**: Trained with RoPE and local-global alternating attention for efficient extended context usage.
97
+ - **Optimizer**: StableAdamW with trapezoidal LR scheduling, consistent with the original ModernBERT approach.
98
+
99
+ ---
100
+
101
+ ## License
102
+ This model inherits the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0) from ModernBERT.
103
+
104
+ ---
105
+
106
+ ## Citation
107
+ If you use **ModernBERT-base-MoME-v0** in your work, please cite the original ModernBERT:
108
+
109
+ ```
110
+ @misc{modernbert,
111
+ title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
112
+ author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
113
+ year={2024},
114
+ eprint={2412.13663},
115
+ archivePrefix={arXiv},
116
+ primaryClass={cs.CL},
117
+ url={https://arxiv.org/abs/2412.13663},
118
+ }
119
+ ```
120
+
121
+ Additional references for the **MoME** (Mixture of Multichain Experts) concept should be included if relevant.