ainergiz commited on
Commit
7320ff5
·
verified ·
1 Parent(s): fce1af4

Upload MedASR MLX int8 model

Browse files
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: other
5
+ license_name: hai-def-terms-of-use
6
+ license_link: https://developers.google.com/health-ai-developer-foundations/terms
7
+ library_name: mlx
8
+ pipeline_tag: automatic-speech-recognition
9
+ tags:
10
+ - mlx
11
+ - audio
12
+ - medical
13
+ - speech-recognition
14
+ - asr
15
+ - conformer
16
+ - ctc
17
+ - apple-silicon
18
+ - on-device
19
+ - healthcare
20
+ - quantized
21
+ - int8
22
+ base_model: ainergiz/medasr-mlx-fp16
23
+ base_model_relation: quantized
24
+ model-index:
25
+ - name: MedASR-MLX-INT8
26
+ results:
27
+ - task:
28
+ type: automatic-speech-recognition
29
+ metrics:
30
+ - name: WER parity vs PyTorch
31
+ type: wer
32
+ value: 0.0
33
+ - name: Token Agreement vs PyTorch
34
+ type: accuracy
35
+ value: 100.0
36
+ ---
37
+
38
+ # MedASR-MLX (int8)
39
+
40
+ **8-bit quantized version of [MedASR-MLX](https://huggingface.co/ainergiz/medasr-mlx-fp16) — 40% smaller with zero quality loss.**
41
+
42
+ Google's MedASR 105M Conformer-CTC, converted to MLX and quantized to 8-bit affine (group size 64). Runs natively on Apple Silicon with **0.0% WER degradation** vs the original PyTorch model.
43
+
44
+ > **Full precision version:** [`ainergiz/medasr-mlx-fp16`](https://huggingface.co/ainergiz/medasr-mlx-fp16) (201 MB)
45
+
46
+ ## Model Details
47
+
48
+ | Property | Value |
49
+ |----------|-------|
50
+ | **Base model** | [google/medasr](https://huggingface.co/google/medasr) → [ainergiz/medasr-mlx-fp16](https://huggingface.co/ainergiz/medasr-mlx-fp16) |
51
+ | **Architecture** | LASR Conformer-CTC (17 layers, 512 hidden, 8 heads) |
52
+ | **Parameters** | 105M |
53
+ | **Weights** | 121 MB (int8, affine, group_size=64) |
54
+ | **Quantization** | 8-bit affine on Linear/Embedding layers; Conv layers remain fp16 |
55
+ | **Vocab** | 512 tokens (SentencePiece) |
56
+ | **Audio input** | 16 kHz mono, 128-bin mel spectrogram |
57
+ | **Framework** | MLX (Apple Silicon native) |
58
+
59
+ ## Performance
60
+
61
+ Benchmarked on Apple M4 Pro (24 GB), 43.8-second medical audio clip:
62
+
63
+ | Metric | int8 (this model) | fp16 | HF PyTorch (fp32) |
64
+ |--------|--------------------|------|-------------------|
65
+ | Latency | **0.16s** | 0.09s | 0.9-1.6s |
66
+ | Real-Time Factor | **0.004** | 0.002 | 0.02-0.04 |
67
+ | Weights on disk | **121 MB** | 201 MB | ~421 MB |
68
+ | WER vs PyTorch | **0.0%** | 0.0% | — |
69
+ | Token Agreement | **100%** | 100% | — |
70
+
71
+ **Lossless quantization** — 40% smaller with identical output tokens.
72
+
73
+ ## Quantization Details
74
+
75
+ - **Method:** `mlx.nn.quantize` (affine mode)
76
+ - **Bits:** 8
77
+ - **Group size:** 64
78
+ - **Target modules:** Linear and Embedding layers (MLX default predicate)
79
+ - **Convolution layers:** Remain in float16 (quantizing Conv1d destroys accuracy for this architecture)
80
+ - **Source:** Quantized from `artifacts/medasr-mlx-fp16`
81
+
82
+ ## Usage
83
+
84
+ ### Requirements
85
+
86
+ ```bash
87
+ pip install mlx numpy soundfile
88
+ ```
89
+
90
+ You also need the model code from the [MedASR-MLX repository](https://github.com/ainergiz/medasr-mlx):
91
+
92
+ ```bash
93
+ git clone https://github.com/ainergiz/medasr-mlx.git
94
+ cd medasr-mlx
95
+ ```
96
+
97
+ ### Transcribe audio
98
+
99
+ ```python
100
+ from model import MedASRModel
101
+ from audio_utils import load_audio_mono
102
+ import mlx.core as mx
103
+
104
+ # Load model (automatically applies int8 quantization from config)
105
+ model_dir = "artifacts/medasr-mlx-int8" # or download from HF
106
+ model = MedASRModel.from_pretrained(model_dir)
107
+
108
+ # Load audio and run inference (same API as fp16)
109
+ audio = load_audio_mono("your_audio.wav", target_sr=16000)
110
+ # ... (see transcribe_mlx.py for full pipeline)
111
+ ```
112
+
113
+ ### Full pipeline
114
+
115
+ ```bash
116
+ python main.py audio/eval-consultation/clip_0001.wav
117
+ ```
118
+
119
+ ## Intended Use
120
+
121
+ MedASR is designed for medical speech recognition — doctor-patient conversations, clinical dictation, and medical terminology. This int8 variant is ideal when storage is constrained but you need lossless accuracy.
122
+
123
+ **This model is not intended for clinical diagnosis or treatment without appropriate validation and regulatory authorization.**
124
+
125
+ ## Limitations
126
+
127
+ - English only
128
+ - Optimized for medical domain speech
129
+ - Raw output includes formatting tokens (`{period}`, `{comma}`, etc.) requiring post-processing
130
+ - Requires Apple Silicon hardware (M1+ Mac or A17+ iPhone)
131
+
132
+ ## License
133
+
134
+ The use of this model is governed by the [Health AI Developer Foundations Terms of Use](https://developers.google.com/health-ai-developer-foundations/terms). Source code components are licensed under Apache 2.0.
135
+
136
+ > HAI-DEF is provided under and subject to the Health AI Developer Foundations Terms of Use.
137
+
138
+ ## Citation
139
+
140
+ ```bibtex
141
+ @misc{medasr-mlx,
142
+ title={MedASR-MLX: On-Device Medical Speech Recognition for Apple Silicon},
143
+ author={Ali Ihsan Nergiz},
144
+ year={2026},
145
+ url={https://huggingface.co/ainergiz/medasr-mlx-int8}
146
+ }
147
+ ```
148
+
149
+ ## Acknowledgments
150
+
151
+ - [Google Health AI Developer Foundations](https://developers.google.com/health-ai-developer-foundations) for the original MedASR model
152
+ - [MLX team at Apple](https://github.com/ml-explore/mlx) for the framework
config.json ADDED
@@ -0,0 +1,318 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "vocab_size": 512,
3
+ "ctc_loss_reduction": "mean",
4
+ "ctc_zero_infinity": true,
5
+ "encoder_config": {
6
+ "rope_parameters": {
7
+ "rope_theta": 10000.0,
8
+ "rope_type": "default"
9
+ },
10
+ "layer_norm_eps": 1e-06,
11
+ "feed_forward_residual_weights": [
12
+ 1.5,
13
+ 0.5
14
+ ],
15
+ "conv_residual_weights": [
16
+ 2.0,
17
+ 1.0
18
+ ],
19
+ "batch_norm_momentum": 0.01,
20
+ "hidden_size": 512,
21
+ "num_hidden_layers": 17,
22
+ "num_attention_heads": 8,
23
+ "num_key_value_heads": 8,
24
+ "intermediate_size": 2048,
25
+ "hidden_act": "silu",
26
+ "attention_bias": false,
27
+ "convolution_bias": false,
28
+ "conv_kernel_size": 32,
29
+ "subsampling_conv_kernel_size": 5,
30
+ "subsampling_conv_stride": 2,
31
+ "subsampling_conv_channels": 256,
32
+ "num_mel_bins": 128,
33
+ "dropout": 0.1,
34
+ "dropout_positions": 0.0,
35
+ "layerdrop": 0.1,
36
+ "activation_dropout": 0.1,
37
+ "attention_dropout": 0.1,
38
+ "max_position_embeddings": 10000,
39
+ "initializer_range": 0.02,
40
+ "return_dict": true,
41
+ "output_hidden_states": false,
42
+ "dtype": null,
43
+ "tie_word_embeddings": true,
44
+ "chunk_size_feed_forward": 0,
45
+ "is_encoder_decoder": false,
46
+ "is_decoder": false,
47
+ "cross_attention_hidden_size": null,
48
+ "add_cross_attention": false,
49
+ "tie_encoder_decoder": false,
50
+ "architectures": null,
51
+ "finetuning_task": null,
52
+ "id2label": {
53
+ "0": "LABEL_0",
54
+ "1": "LABEL_1"
55
+ },
56
+ "label2id": {
57
+ "LABEL_0": 0,
58
+ "LABEL_1": 1
59
+ },
60
+ "task_specific_params": null,
61
+ "problem_type": null,
62
+ "tokenizer_class": null,
63
+ "prefix": null,
64
+ "bos_token_id": null,
65
+ "pad_token_id": null,
66
+ "eos_token_id": null,
67
+ "sep_token_id": null,
68
+ "decoder_start_token_id": null,
69
+ "_name_or_path": "",
70
+ "model_type": "lasr_encoder",
71
+ "output_attentions": false
72
+ },
73
+ "initializer_range": 0.02,
74
+ "return_dict": true,
75
+ "output_hidden_states": false,
76
+ "dtype": "float32",
77
+ "tie_word_embeddings": true,
78
+ "chunk_size_feed_forward": 0,
79
+ "is_encoder_decoder": false,
80
+ "is_decoder": false,
81
+ "cross_attention_hidden_size": null,
82
+ "add_cross_attention": false,
83
+ "tie_encoder_decoder": false,
84
+ "architectures": null,
85
+ "finetuning_task": null,
86
+ "id2label": {
87
+ "0": "LABEL_0",
88
+ "1": "LABEL_1"
89
+ },
90
+ "label2id": {
91
+ "LABEL_0": 0,
92
+ "LABEL_1": 1
93
+ },
94
+ "task_specific_params": null,
95
+ "problem_type": null,
96
+ "tokenizer_class": null,
97
+ "prefix": null,
98
+ "bos_token_id": null,
99
+ "pad_token_id": 0,
100
+ "eos_token_id": null,
101
+ "sep_token_id": null,
102
+ "decoder_start_token_id": null,
103
+ "_name_or_path": "google/medasr",
104
+ "transformers_version": "5.0.0.dev0",
105
+ "model_type": "lasr_ctc",
106
+ "output_attentions": false,
107
+ "_conversion": {
108
+ "source_model_id": "google/medasr",
109
+ "timestamp_utc": "2026-02-09T18:36:08.983149+00:00",
110
+ "weights_dtype": "affine-int8",
111
+ "converter": "convert.py",
112
+ "weight_layout_notes": {
113
+ "linear": "unchanged (out_features, in_features)",
114
+ "conv1d": "transposed from torch [out, in, kernel] to mlx [out, kernel, in]",
115
+ "skipped": [
116
+ "*.num_batches_tracked"
117
+ ]
118
+ },
119
+ "source_weights_dtype": "float16"
120
+ },
121
+ "_cache": {
122
+ "attention": {
123
+ "hidden_size": 512,
124
+ "num_heads": 8,
125
+ "head_dim": 64,
126
+ "is_causal": false,
127
+ "scaling": 0.125
128
+ },
129
+ "convolution": {
130
+ "num_layers": 17,
131
+ "depthwise": [
132
+ {
133
+ "layer": 0,
134
+ "name": "encoder.layers.0.conv.depthwise_conv",
135
+ "kernel_size": 32,
136
+ "cache_size_frames": 31,
137
+ "padding": "same",
138
+ "appears_causal": false
139
+ },
140
+ {
141
+ "layer": 1,
142
+ "name": "encoder.layers.1.conv.depthwise_conv",
143
+ "kernel_size": 32,
144
+ "cache_size_frames": 31,
145
+ "padding": "same",
146
+ "appears_causal": false
147
+ },
148
+ {
149
+ "layer": 2,
150
+ "name": "encoder.layers.2.conv.depthwise_conv",
151
+ "kernel_size": 32,
152
+ "cache_size_frames": 31,
153
+ "padding": "same",
154
+ "appears_causal": false
155
+ },
156
+ {
157
+ "layer": 3,
158
+ "name": "encoder.layers.3.conv.depthwise_conv",
159
+ "kernel_size": 32,
160
+ "cache_size_frames": 31,
161
+ "padding": "same",
162
+ "appears_causal": false
163
+ },
164
+ {
165
+ "layer": 4,
166
+ "name": "encoder.layers.4.conv.depthwise_conv",
167
+ "kernel_size": 32,
168
+ "cache_size_frames": 31,
169
+ "padding": "same",
170
+ "appears_causal": false
171
+ },
172
+ {
173
+ "layer": 5,
174
+ "name": "encoder.layers.5.conv.depthwise_conv",
175
+ "kernel_size": 32,
176
+ "cache_size_frames": 31,
177
+ "padding": "same",
178
+ "appears_causal": false
179
+ },
180
+ {
181
+ "layer": 6,
182
+ "name": "encoder.layers.6.conv.depthwise_conv",
183
+ "kernel_size": 32,
184
+ "cache_size_frames": 31,
185
+ "padding": "same",
186
+ "appears_causal": false
187
+ },
188
+ {
189
+ "layer": 7,
190
+ "name": "encoder.layers.7.conv.depthwise_conv",
191
+ "kernel_size": 32,
192
+ "cache_size_frames": 31,
193
+ "padding": "same",
194
+ "appears_causal": false
195
+ },
196
+ {
197
+ "layer": 8,
198
+ "name": "encoder.layers.8.conv.depthwise_conv",
199
+ "kernel_size": 32,
200
+ "cache_size_frames": 31,
201
+ "padding": "same",
202
+ "appears_causal": false
203
+ },
204
+ {
205
+ "layer": 9,
206
+ "name": "encoder.layers.9.conv.depthwise_conv",
207
+ "kernel_size": 32,
208
+ "cache_size_frames": 31,
209
+ "padding": "same",
210
+ "appears_causal": false
211
+ },
212
+ {
213
+ "layer": 10,
214
+ "name": "encoder.layers.10.conv.depthwise_conv",
215
+ "kernel_size": 32,
216
+ "cache_size_frames": 31,
217
+ "padding": "same",
218
+ "appears_causal": false
219
+ },
220
+ {
221
+ "layer": 11,
222
+ "name": "encoder.layers.11.conv.depthwise_conv",
223
+ "kernel_size": 32,
224
+ "cache_size_frames": 31,
225
+ "padding": "same",
226
+ "appears_causal": false
227
+ },
228
+ {
229
+ "layer": 12,
230
+ "name": "encoder.layers.12.conv.depthwise_conv",
231
+ "kernel_size": 32,
232
+ "cache_size_frames": 31,
233
+ "padding": "same",
234
+ "appears_causal": false
235
+ },
236
+ {
237
+ "layer": 13,
238
+ "name": "encoder.layers.13.conv.depthwise_conv",
239
+ "kernel_size": 32,
240
+ "cache_size_frames": 31,
241
+ "padding": "same",
242
+ "appears_causal": false
243
+ },
244
+ {
245
+ "layer": 14,
246
+ "name": "encoder.layers.14.conv.depthwise_conv",
247
+ "kernel_size": 32,
248
+ "cache_size_frames": 31,
249
+ "padding": "same",
250
+ "appears_causal": false
251
+ },
252
+ {
253
+ "layer": 15,
254
+ "name": "encoder.layers.15.conv.depthwise_conv",
255
+ "kernel_size": 32,
256
+ "cache_size_frames": 31,
257
+ "padding": "same",
258
+ "appears_causal": false
259
+ },
260
+ {
261
+ "layer": 16,
262
+ "name": "encoder.layers.16.conv.depthwise_conv",
263
+ "kernel_size": 32,
264
+ "cache_size_frames": 31,
265
+ "padding": "same",
266
+ "appears_causal": false
267
+ }
268
+ ]
269
+ },
270
+ "subsampling": {
271
+ "convs": [
272
+ {
273
+ "name": "encoder.subsampler.conv_0",
274
+ "stride": 2,
275
+ "kernel_size": 5
276
+ },
277
+ {
278
+ "name": "encoder.subsampler.conv_1",
279
+ "stride": 2,
280
+ "kernel_size": 5
281
+ }
282
+ ],
283
+ "estimated_factor": 4
284
+ },
285
+ "positional_encoding": {
286
+ "type": "LasrEncoderRotaryEmbedding",
287
+ "max_seq_len_cached": 10000,
288
+ "inv_freq_shape": [
289
+ 32
290
+ ]
291
+ }
292
+ },
293
+ "_audio": {
294
+ "feature_size": 128,
295
+ "sampling_rate": 16000,
296
+ "padding_value": 0.0,
297
+ "padding_side": "right",
298
+ "return_attention_mask": true,
299
+ "_processor_class": "LasrProcessor",
300
+ "feature_extractor_type": "LasrFeatureExtractor",
301
+ "hop_length": 160,
302
+ "n_fft": 512,
303
+ "win_length": 400
304
+ },
305
+ "_quantization": {
306
+ "enabled": true,
307
+ "bits": 8,
308
+ "group_size": 64,
309
+ "mode": "affine",
310
+ "target_modules": "mlx.nn.quantize default predicate (Linear/Embedding layers)",
311
+ "source_model_dir": "artifacts/medasr-mlx-fp16",
312
+ "timestamp_utc": "2026-02-09T20:38:09.851556+00:00",
313
+ "notes": [
314
+ "Convolution layers remain non-quantized with this path.",
315
+ "Model loader applies nn.quantize() before loading quantized weights."
316
+ ]
317
+ }
318
+ }
processor/processor_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "feature_extractor": {
3
+ "feature_extractor_type": "LasrFeatureExtractor",
4
+ "feature_size": 128,
5
+ "hop_length": 160,
6
+ "n_fft": 512,
7
+ "padding_side": "right",
8
+ "padding_value": 0.0,
9
+ "processor_class": "LasrProcessor",
10
+ "return_attention_mask": true,
11
+ "sampling_rate": 16000,
12
+ "win_length": 400
13
+ },
14
+ "processor_class": "LasrProcessor"
15
+ }
processor/tokenizer.json ADDED
@@ -0,0 +1,3082 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "<epsilon>",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 1,
17
+ "content": "<s>",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 2,
26
+ "content": "</s>",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ },
33
+ {
34
+ "id": 3,
35
+ "content": "<unk>",
36
+ "single_word": false,
37
+ "lstrip": false,
38
+ "rstrip": false,
39
+ "normalized": false,
40
+ "special": true
41
+ },
42
+ {
43
+ "id": 512,
44
+ "content": "<pad>",
45
+ "single_word": false,
46
+ "lstrip": false,
47
+ "rstrip": false,
48
+ "normalized": false,
49
+ "special": true
50
+ },
51
+ {
52
+ "id": 513,
53
+ "content": "<extra_id_0>",
54
+ "single_word": false,
55
+ "lstrip": false,
56
+ "rstrip": false,
57
+ "normalized": false,
58
+ "special": true
59
+ },
60
+ {
61
+ "id": 514,
62
+ "content": "<extra_id_1>",
63
+ "single_word": false,
64
+ "lstrip": false,
65
+ "rstrip": false,
66
+ "normalized": false,
67
+ "special": true
68
+ },
69
+ {
70
+ "id": 515,
71
+ "content": "<extra_id_2>",
72
+ "single_word": false,
73
+ "lstrip": false,
74
+ "rstrip": false,
75
+ "normalized": false,
76
+ "special": true
77
+ },
78
+ {
79
+ "id": 516,
80
+ "content": "<extra_id_3>",
81
+ "single_word": false,
82
+ "lstrip": false,
83
+ "rstrip": false,
84
+ "normalized": false,
85
+ "special": true
86
+ },
87
+ {
88
+ "id": 517,
89
+ "content": "<extra_id_4>",
90
+ "single_word": false,
91
+ "lstrip": false,
92
+ "rstrip": false,
93
+ "normalized": false,
94
+ "special": true
95
+ },
96
+ {
97
+ "id": 518,
98
+ "content": "<extra_id_5>",
99
+ "single_word": false,
100
+ "lstrip": false,
101
+ "rstrip": false,
102
+ "normalized": false,
103
+ "special": true
104
+ },
105
+ {
106
+ "id": 519,
107
+ "content": "<extra_id_6>",
108
+ "single_word": false,
109
+ "lstrip": false,
110
+ "rstrip": false,
111
+ "normalized": false,
112
+ "special": true
113
+ },
114
+ {
115
+ "id": 520,
116
+ "content": "<extra_id_7>",
117
+ "single_word": false,
118
+ "lstrip": false,
119
+ "rstrip": false,
120
+ "normalized": false,
121
+ "special": true
122
+ },
123
+ {
124
+ "id": 521,
125
+ "content": "<extra_id_8>",
126
+ "single_word": false,
127
+ "lstrip": false,
128
+ "rstrip": false,
129
+ "normalized": false,
130
+ "special": true
131
+ },
132
+ {
133
+ "id": 522,
134
+ "content": "<extra_id_9>",
135
+ "single_word": false,
136
+ "lstrip": false,
137
+ "rstrip": false,
138
+ "normalized": false,
139
+ "special": true
140
+ },
141
+ {
142
+ "id": 523,
143
+ "content": "<extra_id_10>",
144
+ "single_word": false,
145
+ "lstrip": false,
146
+ "rstrip": false,
147
+ "normalized": false,
148
+ "special": true
149
+ },
150
+ {
151
+ "id": 524,
152
+ "content": "<extra_id_11>",
153
+ "single_word": false,
154
+ "lstrip": false,
155
+ "rstrip": false,
156
+ "normalized": false,
157
+ "special": true
158
+ },
159
+ {
160
+ "id": 525,
161
+ "content": "<extra_id_12>",
162
+ "single_word": false,
163
+ "lstrip": false,
164
+ "rstrip": false,
165
+ "normalized": false,
166
+ "special": true
167
+ },
168
+ {
169
+ "id": 526,
170
+ "content": "<extra_id_13>",
171
+ "single_word": false,
172
+ "lstrip": false,
173
+ "rstrip": false,
174
+ "normalized": false,
175
+ "special": true
176
+ },
177
+ {
178
+ "id": 527,
179
+ "content": "<extra_id_14>",
180
+ "single_word": false,
181
+ "lstrip": false,
182
+ "rstrip": false,
183
+ "normalized": false,
184
+ "special": true
185
+ },
186
+ {
187
+ "id": 528,
188
+ "content": "<extra_id_15>",
189
+ "single_word": false,
190
+ "lstrip": false,
191
+ "rstrip": false,
192
+ "normalized": false,
193
+ "special": true
194
+ },
195
+ {
196
+ "id": 529,
197
+ "content": "<extra_id_16>",
198
+ "single_word": false,
199
+ "lstrip": false,
200
+ "rstrip": false,
201
+ "normalized": false,
202
+ "special": true
203
+ },
204
+ {
205
+ "id": 530,
206
+ "content": "<extra_id_17>",
207
+ "single_word": false,
208
+ "lstrip": false,
209
+ "rstrip": false,
210
+ "normalized": false,
211
+ "special": true
212
+ },
213
+ {
214
+ "id": 531,
215
+ "content": "<extra_id_18>",
216
+ "single_word": false,
217
+ "lstrip": false,
218
+ "rstrip": false,
219
+ "normalized": false,
220
+ "special": true
221
+ },
222
+ {
223
+ "id": 532,
224
+ "content": "<extra_id_19>",
225
+ "single_word": false,
226
+ "lstrip": false,
227
+ "rstrip": false,
228
+ "normalized": false,
229
+ "special": true
230
+ },
231
+ {
232
+ "id": 533,
233
+ "content": "<extra_id_20>",
234
+ "single_word": false,
235
+ "lstrip": false,
236
+ "rstrip": false,
237
+ "normalized": false,
238
+ "special": true
239
+ },
240
+ {
241
+ "id": 534,
242
+ "content": "<extra_id_21>",
243
+ "single_word": false,
244
+ "lstrip": false,
245
+ "rstrip": false,
246
+ "normalized": false,
247
+ "special": true
248
+ },
249
+ {
250
+ "id": 535,
251
+ "content": "<extra_id_22>",
252
+ "single_word": false,
253
+ "lstrip": false,
254
+ "rstrip": false,
255
+ "normalized": false,
256
+ "special": true
257
+ },
258
+ {
259
+ "id": 536,
260
+ "content": "<extra_id_23>",
261
+ "single_word": false,
262
+ "lstrip": false,
263
+ "rstrip": false,
264
+ "normalized": false,
265
+ "special": true
266
+ },
267
+ {
268
+ "id": 537,
269
+ "content": "<extra_id_24>",
270
+ "single_word": false,
271
+ "lstrip": false,
272
+ "rstrip": false,
273
+ "normalized": false,
274
+ "special": true
275
+ },
276
+ {
277
+ "id": 538,
278
+ "content": "<extra_id_25>",
279
+ "single_word": false,
280
+ "lstrip": false,
281
+ "rstrip": false,
282
+ "normalized": false,
283
+ "special": true
284
+ },
285
+ {
286
+ "id": 539,
287
+ "content": "<extra_id_26>",
288
+ "single_word": false,
289
+ "lstrip": false,
290
+ "rstrip": false,
291
+ "normalized": false,
292
+ "special": true
293
+ },
294
+ {
295
+ "id": 540,
296
+ "content": "<extra_id_27>",
297
+ "single_word": false,
298
+ "lstrip": false,
299
+ "rstrip": false,
300
+ "normalized": false,
301
+ "special": true
302
+ },
303
+ {
304
+ "id": 541,
305
+ "content": "<extra_id_28>",
306
+ "single_word": false,
307
+ "lstrip": false,
308
+ "rstrip": false,
309
+ "normalized": false,
310
+ "special": true
311
+ },
312
+ {
313
+ "id": 542,
314
+ "content": "<extra_id_29>",
315
+ "single_word": false,
316
+ "lstrip": false,
317
+ "rstrip": false,
318
+ "normalized": false,
319
+ "special": true
320
+ },
321
+ {
322
+ "id": 543,
323
+ "content": "<extra_id_30>",
324
+ "single_word": false,
325
+ "lstrip": false,
326
+ "rstrip": false,
327
+ "normalized": false,
328
+ "special": true
329
+ },
330
+ {
331
+ "id": 544,
332
+ "content": "<extra_id_31>",
333
+ "single_word": false,
334
+ "lstrip": false,
335
+ "rstrip": false,
336
+ "normalized": false,
337
+ "special": true
338
+ },
339
+ {
340
+ "id": 545,
341
+ "content": "<extra_id_32>",
342
+ "single_word": false,
343
+ "lstrip": false,
344
+ "rstrip": false,
345
+ "normalized": false,
346
+ "special": true
347
+ },
348
+ {
349
+ "id": 546,
350
+ "content": "<extra_id_33>",
351
+ "single_word": false,
352
+ "lstrip": false,
353
+ "rstrip": false,
354
+ "normalized": false,
355
+ "special": true
356
+ },
357
+ {
358
+ "id": 547,
359
+ "content": "<extra_id_34>",
360
+ "single_word": false,
361
+ "lstrip": false,
362
+ "rstrip": false,
363
+ "normalized": false,
364
+ "special": true
365
+ },
366
+ {
367
+ "id": 548,
368
+ "content": "<extra_id_35>",
369
+ "single_word": false,
370
+ "lstrip": false,
371
+ "rstrip": false,
372
+ "normalized": false,
373
+ "special": true
374
+ },
375
+ {
376
+ "id": 549,
377
+ "content": "<extra_id_36>",
378
+ "single_word": false,
379
+ "lstrip": false,
380
+ "rstrip": false,
381
+ "normalized": false,
382
+ "special": true
383
+ },
384
+ {
385
+ "id": 550,
386
+ "content": "<extra_id_37>",
387
+ "single_word": false,
388
+ "lstrip": false,
389
+ "rstrip": false,
390
+ "normalized": false,
391
+ "special": true
392
+ },
393
+ {
394
+ "id": 551,
395
+ "content": "<extra_id_38>",
396
+ "single_word": false,
397
+ "lstrip": false,
398
+ "rstrip": false,
399
+ "normalized": false,
400
+ "special": true
401
+ },
402
+ {
403
+ "id": 552,
404
+ "content": "<extra_id_39>",
405
+ "single_word": false,
406
+ "lstrip": false,
407
+ "rstrip": false,
408
+ "normalized": false,
409
+ "special": true
410
+ },
411
+ {
412
+ "id": 553,
413
+ "content": "<extra_id_40>",
414
+ "single_word": false,
415
+ "lstrip": false,
416
+ "rstrip": false,
417
+ "normalized": false,
418
+ "special": true
419
+ },
420
+ {
421
+ "id": 554,
422
+ "content": "<extra_id_41>",
423
+ "single_word": false,
424
+ "lstrip": false,
425
+ "rstrip": false,
426
+ "normalized": false,
427
+ "special": true
428
+ },
429
+ {
430
+ "id": 555,
431
+ "content": "<extra_id_42>",
432
+ "single_word": false,
433
+ "lstrip": false,
434
+ "rstrip": false,
435
+ "normalized": false,
436
+ "special": true
437
+ },
438
+ {
439
+ "id": 556,
440
+ "content": "<extra_id_43>",
441
+ "single_word": false,
442
+ "lstrip": false,
443
+ "rstrip": false,
444
+ "normalized": false,
445
+ "special": true
446
+ },
447
+ {
448
+ "id": 557,
449
+ "content": "<extra_id_44>",
450
+ "single_word": false,
451
+ "lstrip": false,
452
+ "rstrip": false,
453
+ "normalized": false,
454
+ "special": true
455
+ },
456
+ {
457
+ "id": 558,
458
+ "content": "<extra_id_45>",
459
+ "single_word": false,
460
+ "lstrip": false,
461
+ "rstrip": false,
462
+ "normalized": false,
463
+ "special": true
464
+ },
465
+ {
466
+ "id": 559,
467
+ "content": "<extra_id_46>",
468
+ "single_word": false,
469
+ "lstrip": false,
470
+ "rstrip": false,
471
+ "normalized": false,
472
+ "special": true
473
+ },
474
+ {
475
+ "id": 560,
476
+ "content": "<extra_id_47>",
477
+ "single_word": false,
478
+ "lstrip": false,
479
+ "rstrip": false,
480
+ "normalized": false,
481
+ "special": true
482
+ },
483
+ {
484
+ "id": 561,
485
+ "content": "<extra_id_48>",
486
+ "single_word": false,
487
+ "lstrip": false,
488
+ "rstrip": false,
489
+ "normalized": false,
490
+ "special": true
491
+ },
492
+ {
493
+ "id": 562,
494
+ "content": "<extra_id_49>",
495
+ "single_word": false,
496
+ "lstrip": false,
497
+ "rstrip": false,
498
+ "normalized": false,
499
+ "special": true
500
+ },
501
+ {
502
+ "id": 563,
503
+ "content": "<extra_id_50>",
504
+ "single_word": false,
505
+ "lstrip": false,
506
+ "rstrip": false,
507
+ "normalized": false,
508
+ "special": true
509
+ },
510
+ {
511
+ "id": 564,
512
+ "content": "<extra_id_51>",
513
+ "single_word": false,
514
+ "lstrip": false,
515
+ "rstrip": false,
516
+ "normalized": false,
517
+ "special": true
518
+ },
519
+ {
520
+ "id": 565,
521
+ "content": "<extra_id_52>",
522
+ "single_word": false,
523
+ "lstrip": false,
524
+ "rstrip": false,
525
+ "normalized": false,
526
+ "special": true
527
+ },
528
+ {
529
+ "id": 566,
530
+ "content": "<extra_id_53>",
531
+ "single_word": false,
532
+ "lstrip": false,
533
+ "rstrip": false,
534
+ "normalized": false,
535
+ "special": true
536
+ },
537
+ {
538
+ "id": 567,
539
+ "content": "<extra_id_54>",
540
+ "single_word": false,
541
+ "lstrip": false,
542
+ "rstrip": false,
543
+ "normalized": false,
544
+ "special": true
545
+ },
546
+ {
547
+ "id": 568,
548
+ "content": "<extra_id_55>",
549
+ "single_word": false,
550
+ "lstrip": false,
551
+ "rstrip": false,
552
+ "normalized": false,
553
+ "special": true
554
+ },
555
+ {
556
+ "id": 569,
557
+ "content": "<extra_id_56>",
558
+ "single_word": false,
559
+ "lstrip": false,
560
+ "rstrip": false,
561
+ "normalized": false,
562
+ "special": true
563
+ },
564
+ {
565
+ "id": 570,
566
+ "content": "<extra_id_57>",
567
+ "single_word": false,
568
+ "lstrip": false,
569
+ "rstrip": false,
570
+ "normalized": false,
571
+ "special": true
572
+ },
573
+ {
574
+ "id": 571,
575
+ "content": "<extra_id_58>",
576
+ "single_word": false,
577
+ "lstrip": false,
578
+ "rstrip": false,
579
+ "normalized": false,
580
+ "special": true
581
+ },
582
+ {
583
+ "id": 572,
584
+ "content": "<extra_id_59>",
585
+ "single_word": false,
586
+ "lstrip": false,
587
+ "rstrip": false,
588
+ "normalized": false,
589
+ "special": true
590
+ },
591
+ {
592
+ "id": 573,
593
+ "content": "<extra_id_60>",
594
+ "single_word": false,
595
+ "lstrip": false,
596
+ "rstrip": false,
597
+ "normalized": false,
598
+ "special": true
599
+ },
600
+ {
601
+ "id": 574,
602
+ "content": "<extra_id_61>",
603
+ "single_word": false,
604
+ "lstrip": false,
605
+ "rstrip": false,
606
+ "normalized": false,
607
+ "special": true
608
+ },
609
+ {
610
+ "id": 575,
611
+ "content": "<extra_id_62>",
612
+ "single_word": false,
613
+ "lstrip": false,
614
+ "rstrip": false,
615
+ "normalized": false,
616
+ "special": true
617
+ },
618
+ {
619
+ "id": 576,
620
+ "content": "<extra_id_63>",
621
+ "single_word": false,
622
+ "lstrip": false,
623
+ "rstrip": false,
624
+ "normalized": false,
625
+ "special": true
626
+ },
627
+ {
628
+ "id": 577,
629
+ "content": "<extra_id_64>",
630
+ "single_word": false,
631
+ "lstrip": false,
632
+ "rstrip": false,
633
+ "normalized": false,
634
+ "special": true
635
+ },
636
+ {
637
+ "id": 578,
638
+ "content": "<extra_id_65>",
639
+ "single_word": false,
640
+ "lstrip": false,
641
+ "rstrip": false,
642
+ "normalized": false,
643
+ "special": true
644
+ },
645
+ {
646
+ "id": 579,
647
+ "content": "<extra_id_66>",
648
+ "single_word": false,
649
+ "lstrip": false,
650
+ "rstrip": false,
651
+ "normalized": false,
652
+ "special": true
653
+ },
654
+ {
655
+ "id": 580,
656
+ "content": "<extra_id_67>",
657
+ "single_word": false,
658
+ "lstrip": false,
659
+ "rstrip": false,
660
+ "normalized": false,
661
+ "special": true
662
+ },
663
+ {
664
+ "id": 581,
665
+ "content": "<extra_id_68>",
666
+ "single_word": false,
667
+ "lstrip": false,
668
+ "rstrip": false,
669
+ "normalized": false,
670
+ "special": true
671
+ },
672
+ {
673
+ "id": 582,
674
+ "content": "<extra_id_69>",
675
+ "single_word": false,
676
+ "lstrip": false,
677
+ "rstrip": false,
678
+ "normalized": false,
679
+ "special": true
680
+ },
681
+ {
682
+ "id": 583,
683
+ "content": "<extra_id_70>",
684
+ "single_word": false,
685
+ "lstrip": false,
686
+ "rstrip": false,
687
+ "normalized": false,
688
+ "special": true
689
+ },
690
+ {
691
+ "id": 584,
692
+ "content": "<extra_id_71>",
693
+ "single_word": false,
694
+ "lstrip": false,
695
+ "rstrip": false,
696
+ "normalized": false,
697
+ "special": true
698
+ },
699
+ {
700
+ "id": 585,
701
+ "content": "<extra_id_72>",
702
+ "single_word": false,
703
+ "lstrip": false,
704
+ "rstrip": false,
705
+ "normalized": false,
706
+ "special": true
707
+ },
708
+ {
709
+ "id": 586,
710
+ "content": "<extra_id_73>",
711
+ "single_word": false,
712
+ "lstrip": false,
713
+ "rstrip": false,
714
+ "normalized": false,
715
+ "special": true
716
+ },
717
+ {
718
+ "id": 587,
719
+ "content": "<extra_id_74>",
720
+ "single_word": false,
721
+ "lstrip": false,
722
+ "rstrip": false,
723
+ "normalized": false,
724
+ "special": true
725
+ },
726
+ {
727
+ "id": 588,
728
+ "content": "<extra_id_75>",
729
+ "single_word": false,
730
+ "lstrip": false,
731
+ "rstrip": false,
732
+ "normalized": false,
733
+ "special": true
734
+ },
735
+ {
736
+ "id": 589,
737
+ "content": "<extra_id_76>",
738
+ "single_word": false,
739
+ "lstrip": false,
740
+ "rstrip": false,
741
+ "normalized": false,
742
+ "special": true
743
+ },
744
+ {
745
+ "id": 590,
746
+ "content": "<extra_id_77>",
747
+ "single_word": false,
748
+ "lstrip": false,
749
+ "rstrip": false,
750
+ "normalized": false,
751
+ "special": true
752
+ },
753
+ {
754
+ "id": 591,
755
+ "content": "<extra_id_78>",
756
+ "single_word": false,
757
+ "lstrip": false,
758
+ "rstrip": false,
759
+ "normalized": false,
760
+ "special": true
761
+ },
762
+ {
763
+ "id": 592,
764
+ "content": "<extra_id_79>",
765
+ "single_word": false,
766
+ "lstrip": false,
767
+ "rstrip": false,
768
+ "normalized": false,
769
+ "special": true
770
+ },
771
+ {
772
+ "id": 593,
773
+ "content": "<extra_id_80>",
774
+ "single_word": false,
775
+ "lstrip": false,
776
+ "rstrip": false,
777
+ "normalized": false,
778
+ "special": true
779
+ },
780
+ {
781
+ "id": 594,
782
+ "content": "<extra_id_81>",
783
+ "single_word": false,
784
+ "lstrip": false,
785
+ "rstrip": false,
786
+ "normalized": false,
787
+ "special": true
788
+ },
789
+ {
790
+ "id": 595,
791
+ "content": "<extra_id_82>",
792
+ "single_word": false,
793
+ "lstrip": false,
794
+ "rstrip": false,
795
+ "normalized": false,
796
+ "special": true
797
+ },
798
+ {
799
+ "id": 596,
800
+ "content": "<extra_id_83>",
801
+ "single_word": false,
802
+ "lstrip": false,
803
+ "rstrip": false,
804
+ "normalized": false,
805
+ "special": true
806
+ },
807
+ {
808
+ "id": 597,
809
+ "content": "<extra_id_84>",
810
+ "single_word": false,
811
+ "lstrip": false,
812
+ "rstrip": false,
813
+ "normalized": false,
814
+ "special": true
815
+ },
816
+ {
817
+ "id": 598,
818
+ "content": "<extra_id_85>",
819
+ "single_word": false,
820
+ "lstrip": false,
821
+ "rstrip": false,
822
+ "normalized": false,
823
+ "special": true
824
+ },
825
+ {
826
+ "id": 599,
827
+ "content": "<extra_id_86>",
828
+ "single_word": false,
829
+ "lstrip": false,
830
+ "rstrip": false,
831
+ "normalized": false,
832
+ "special": true
833
+ },
834
+ {
835
+ "id": 600,
836
+ "content": "<extra_id_87>",
837
+ "single_word": false,
838
+ "lstrip": false,
839
+ "rstrip": false,
840
+ "normalized": false,
841
+ "special": true
842
+ },
843
+ {
844
+ "id": 601,
845
+ "content": "<extra_id_88>",
846
+ "single_word": false,
847
+ "lstrip": false,
848
+ "rstrip": false,
849
+ "normalized": false,
850
+ "special": true
851
+ },
852
+ {
853
+ "id": 602,
854
+ "content": "<extra_id_89>",
855
+ "single_word": false,
856
+ "lstrip": false,
857
+ "rstrip": false,
858
+ "normalized": false,
859
+ "special": true
860
+ },
861
+ {
862
+ "id": 603,
863
+ "content": "<extra_id_90>",
864
+ "single_word": false,
865
+ "lstrip": false,
866
+ "rstrip": false,
867
+ "normalized": false,
868
+ "special": true
869
+ },
870
+ {
871
+ "id": 604,
872
+ "content": "<extra_id_91>",
873
+ "single_word": false,
874
+ "lstrip": false,
875
+ "rstrip": false,
876
+ "normalized": false,
877
+ "special": true
878
+ },
879
+ {
880
+ "id": 605,
881
+ "content": "<extra_id_92>",
882
+ "single_word": false,
883
+ "lstrip": false,
884
+ "rstrip": false,
885
+ "normalized": false,
886
+ "special": true
887
+ },
888
+ {
889
+ "id": 606,
890
+ "content": "<extra_id_93>",
891
+ "single_word": false,
892
+ "lstrip": false,
893
+ "rstrip": false,
894
+ "normalized": false,
895
+ "special": true
896
+ },
897
+ {
898
+ "id": 607,
899
+ "content": "<extra_id_94>",
900
+ "single_word": false,
901
+ "lstrip": false,
902
+ "rstrip": false,
903
+ "normalized": false,
904
+ "special": true
905
+ },
906
+ {
907
+ "id": 608,
908
+ "content": "<extra_id_95>",
909
+ "single_word": false,
910
+ "lstrip": false,
911
+ "rstrip": false,
912
+ "normalized": false,
913
+ "special": true
914
+ },
915
+ {
916
+ "id": 609,
917
+ "content": "<extra_id_96>",
918
+ "single_word": false,
919
+ "lstrip": false,
920
+ "rstrip": false,
921
+ "normalized": false,
922
+ "special": true
923
+ },
924
+ {
925
+ "id": 610,
926
+ "content": "<extra_id_97>",
927
+ "single_word": false,
928
+ "lstrip": false,
929
+ "rstrip": false,
930
+ "normalized": false,
931
+ "special": true
932
+ },
933
+ {
934
+ "id": 611,
935
+ "content": "<extra_id_98>",
936
+ "single_word": false,
937
+ "lstrip": false,
938
+ "rstrip": false,
939
+ "normalized": false,
940
+ "special": true
941
+ },
942
+ {
943
+ "id": 612,
944
+ "content": "<extra_id_99>",
945
+ "single_word": false,
946
+ "lstrip": false,
947
+ "rstrip": false,
948
+ "normalized": false,
949
+ "special": true
950
+ }
951
+ ],
952
+ "normalizer": null,
953
+ "pre_tokenizer": {
954
+ "type": "Sequence",
955
+ "pretokenizers": [
956
+ {
957
+ "type": "WhitespaceSplit"
958
+ },
959
+ {
960
+ "type": "Metaspace",
961
+ "replacement": "▁",
962
+ "prepend_scheme": "always",
963
+ "split": true
964
+ }
965
+ ]
966
+ },
967
+ "post_processor": {
968
+ "type": "TemplateProcessing",
969
+ "single": [
970
+ {
971
+ "Sequence": {
972
+ "id": "A",
973
+ "type_id": 0
974
+ }
975
+ },
976
+ {
977
+ "SpecialToken": {
978
+ "id": "</s>",
979
+ "type_id": 0
980
+ }
981
+ }
982
+ ],
983
+ "pair": [
984
+ {
985
+ "Sequence": {
986
+ "id": "A",
987
+ "type_id": 0
988
+ }
989
+ },
990
+ {
991
+ "SpecialToken": {
992
+ "id": "</s>",
993
+ "type_id": 0
994
+ }
995
+ },
996
+ {
997
+ "Sequence": {
998
+ "id": "B",
999
+ "type_id": 0
1000
+ }
1001
+ },
1002
+ {
1003
+ "SpecialToken": {
1004
+ "id": "</s>",
1005
+ "type_id": 0
1006
+ }
1007
+ }
1008
+ ],
1009
+ "special_tokens": {
1010
+ "</s>": {
1011
+ "id": "</s>",
1012
+ "ids": [
1013
+ 2
1014
+ ],
1015
+ "tokens": [
1016
+ "</s>"
1017
+ ]
1018
+ }
1019
+ }
1020
+ },
1021
+ "decoder": {
1022
+ "type": "Metaspace",
1023
+ "replacement": "▁",
1024
+ "prepend_scheme": "always",
1025
+ "split": true
1026
+ },
1027
+ "model": {
1028
+ "type": "Unigram",
1029
+ "unk_id": 3,
1030
+ "vocab": [
1031
+ [
1032
+ "<epsilon>",
1033
+ 0.0
1034
+ ],
1035
+ [
1036
+ "<s>",
1037
+ 0.0
1038
+ ],
1039
+ [
1040
+ "</s>",
1041
+ 0.0
1042
+ ],
1043
+ [
1044
+ "<unk>",
1045
+ 0.0
1046
+ ],
1047
+ [
1048
+ "▁",
1049
+ -3.1446692943573
1050
+ ],
1051
+ [
1052
+ "s",
1053
+ -3.203380823135376
1054
+ ],
1055
+ [
1056
+ ",",
1057
+ -3.58845591545105
1058
+ ],
1059
+ [
1060
+ "▁the",
1061
+ -3.7101337909698486
1062
+ ],
1063
+ [
1064
+ "t",
1065
+ -3.804173231124878
1066
+ ],
1067
+ [
1068
+ ".",
1069
+ -3.901777982711792
1070
+ ],
1071
+ [
1072
+ "e",
1073
+ -3.9562227725982666
1074
+ ],
1075
+ [
1076
+ "a",
1077
+ -4.208385944366455
1078
+ ],
1079
+ [
1080
+ "ed",
1081
+ -4.236613750457764
1082
+ ],
1083
+ [
1084
+ "o",
1085
+ -4.267106533050537
1086
+ ],
1087
+ [
1088
+ "▁a",
1089
+ -4.278589725494385
1090
+ ],
1091
+ [
1092
+ "d",
1093
+ -4.31550931930542
1094
+ ],
1095
+ [
1096
+ "▁of",
1097
+ -4.320993900299072
1098
+ ],
1099
+ [
1100
+ "n",
1101
+ -4.32736349105835
1102
+ ],
1103
+ [
1104
+ "▁to",
1105
+ -4.410154819488525
1106
+ ],
1107
+ [
1108
+ "▁and",
1109
+ -4.412942886352539
1110
+ ],
1111
+ [
1112
+ "y",
1113
+ -4.42034387588501
1114
+ ],
1115
+ [
1116
+ "m",
1117
+ -4.563985347747803
1118
+ ],
1119
+ [
1120
+ "ing",
1121
+ -4.623386859893799
1122
+ ],
1123
+ [
1124
+ "i",
1125
+ -4.663941860198975
1126
+ ],
1127
+ [
1128
+ "▁in",
1129
+ -4.729295253753662
1130
+ ],
1131
+ [
1132
+ "r",
1133
+ -4.746771335601807
1134
+ ],
1135
+ [
1136
+ "ar",
1137
+ -4.864534854888916
1138
+ ],
1139
+ [
1140
+ "p",
1141
+ -4.92025089263916
1142
+ ],
1143
+ [
1144
+ "u",
1145
+ -4.934468746185303
1146
+ ],
1147
+ [
1148
+ "al",
1149
+ -5.0213141441345215
1150
+ ],
1151
+ [
1152
+ "c",
1153
+ -5.054973602294922
1154
+ ],
1155
+ [
1156
+ "er",
1157
+ -5.077834606170654
1158
+ ],
1159
+ [
1160
+ "▁I",
1161
+ -5.0813212394714355
1162
+ ],
1163
+ [
1164
+ "re",
1165
+ -5.087508201599121
1166
+ ],
1167
+ [
1168
+ "'",
1169
+ -5.108129024505615
1170
+ ],
1171
+ [
1172
+ "st",
1173
+ -5.111265659332275
1174
+ ],
1175
+ [
1176
+ "in",
1177
+ -5.128320693969727
1178
+ ],
1179
+ [
1180
+ "▁he",
1181
+ -5.188534259796143
1182
+ ],
1183
+ [
1184
+ "▁\"",
1185
+ -5.191742897033691
1186
+ ],
1187
+ [
1188
+ "f",
1189
+ -5.197833061218262
1190
+ ],
1191
+ [
1192
+ "or",
1193
+ -5.202470779418945
1194
+ ],
1195
+ [
1196
+ "ly",
1197
+ -5.220283031463623
1198
+ ],
1199
+ [
1200
+ "l",
1201
+ -5.239860534667969
1202
+ ],
1203
+ [
1204
+ "g",
1205
+ -5.254385948181152
1206
+ ],
1207
+ [
1208
+ "b",
1209
+ -5.298688888549805
1210
+ ],
1211
+ [
1212
+ "▁was",
1213
+ -5.323605060577393
1214
+ ],
1215
+ [
1216
+ "le",
1217
+ -5.332465648651123
1218
+ ],
1219
+ [
1220
+ "▁f",
1221
+ -5.342121601104736
1222
+ ],
1223
+ [
1224
+ "▁that",
1225
+ -5.370957851409912
1226
+ ],
1227
+ [
1228
+ "▁be",
1229
+ -5.396873950958252
1230
+ ],
1231
+ [
1232
+ "▁w",
1233
+ -5.398232460021973
1234
+ ],
1235
+ [
1236
+ "▁b",
1237
+ -5.408049583435059
1238
+ ],
1239
+ [
1240
+ "k",
1241
+ -5.498411178588867
1242
+ ],
1243
+ [
1244
+ "▁it",
1245
+ -5.498649597167969
1246
+ ],
1247
+ [
1248
+ "▁c",
1249
+ -5.5201802253723145
1250
+ ],
1251
+ [
1252
+ "▁for",
1253
+ -5.558920860290527
1254
+ ],
1255
+ [
1256
+ "on",
1257
+ -5.5722246170043945
1258
+ ],
1259
+ [
1260
+ "▁is",
1261
+ -5.581616401672363
1262
+ ],
1263
+ [
1264
+ "▁re",
1265
+ -5.58840274810791
1266
+ ],
1267
+ [
1268
+ "▁p",
1269
+ -5.6148576736450195
1270
+ ],
1271
+ [
1272
+ "th",
1273
+ -5.620391845703125
1274
+ ],
1275
+ [
1276
+ "ur",
1277
+ -5.627001762390137
1278
+ ],
1279
+ [
1280
+ "w",
1281
+ -5.631758213043213
1282
+ ],
1283
+ [
1284
+ "▁his",
1285
+ -5.649014949798584
1286
+ ],
1287
+ [
1288
+ "▁with",
1289
+ -5.651855945587158
1290
+ ],
1291
+ [
1292
+ "ter",
1293
+ -5.65963077545166
1294
+ ],
1295
+ [
1296
+ "ce",
1297
+ -5.678774356842041
1298
+ ],
1299
+ [
1300
+ "an",
1301
+ -5.687518119812012
1302
+ ],
1303
+ [
1304
+ "ri",
1305
+ -5.712653636932373
1306
+ ],
1307
+ [
1308
+ "▁you",
1309
+ -5.749940395355225
1310
+ ],
1311
+ [
1312
+ "h",
1313
+ -5.7766032218933105
1314
+ ],
1315
+ [
1316
+ "es",
1317
+ -5.78245735168457
1318
+ ],
1319
+ [
1320
+ "▁me",
1321
+ -5.794474124908447
1322
+ ],
1323
+ [
1324
+ "it",
1325
+ -5.82189416885376
1326
+ ],
1327
+ [
1328
+ "ro",
1329
+ -5.838681221008301
1330
+ ],
1331
+ [
1332
+ "ent",
1333
+ -5.841847896575928
1334
+ ],
1335
+ [
1336
+ "v",
1337
+ -5.853638648986816
1338
+ ],
1339
+ [
1340
+ "▁had",
1341
+ -5.862800121307373
1342
+ ],
1343
+ [
1344
+ "▁The",
1345
+ -5.867185115814209
1346
+ ],
1347
+ [
1348
+ "en",
1349
+ -5.879556655883789
1350
+ ],
1351
+ [
1352
+ "▁as",
1353
+ -5.892141819000244
1354
+ ],
1355
+ [
1356
+ "▁de",
1357
+ -5.9042229652404785
1358
+ ],
1359
+ [
1360
+ "nd",
1361
+ -5.907679080963135
1362
+ ],
1363
+ [
1364
+ "▁her",
1365
+ -5.914851665496826
1366
+ ],
1367
+ [
1368
+ "ic",
1369
+ -5.9341912269592285
1370
+ ],
1371
+ [
1372
+ "▁not",
1373
+ -5.934748649597168
1374
+ ],
1375
+ [
1376
+ "se",
1377
+ -5.941751480102539
1378
+ ],
1379
+ [
1380
+ ";",
1381
+ -5.961045265197754
1382
+ ],
1383
+ [
1384
+ "te",
1385
+ -5.971425533294678
1386
+ ],
1387
+ [
1388
+ "▁e",
1389
+ -6.0015764236450195
1390
+ ],
1391
+ [
1392
+ "ch",
1393
+ -6.004841327667236
1394
+ ],
1395
+ [
1396
+ "ve",
1397
+ -6.012419700622559
1398
+ ],
1399
+ [
1400
+ "ne",
1401
+ -6.013899803161621
1402
+ ],
1403
+ [
1404
+ "▁A",
1405
+ -6.0221428871154785
1406
+ ],
1407
+ [
1408
+ "▁on",
1409
+ -6.028809070587158
1410
+ ],
1411
+ [
1412
+ "il",
1413
+ -6.041737079620361
1414
+ ],
1415
+ [
1416
+ "is",
1417
+ -6.04183292388916
1418
+ ],
1419
+ [
1420
+ "▁so",
1421
+ -6.059542179107666
1422
+ ],
1423
+ [
1424
+ "▁S",
1425
+ -6.070916652679443
1426
+ ],
1427
+ [
1428
+ "at",
1429
+ -6.07675838470459
1430
+ ],
1431
+ [
1432
+ "la",
1433
+ -6.0870466232299805
1434
+ ],
1435
+ [
1436
+ "ad",
1437
+ -6.092111110687256
1438
+ ],
1439
+ [
1440
+ "▁at",
1441
+ -6.0932087898254395
1442
+ ],
1443
+ [
1444
+ "ir",
1445
+ -6.123541355133057
1446
+ ],
1447
+ [
1448
+ "▁do",
1449
+ -6.126033782958984
1450
+ ],
1451
+ [
1452
+ "ng",
1453
+ -6.177332401275635
1454
+ ],
1455
+ [
1456
+ "▁g",
1457
+ -6.208189487457275
1458
+ ],
1459
+ [
1460
+ "}",
1461
+ -6.218713760375977
1462
+ ],
1463
+ [
1464
+ "ra",
1465
+ -6.221874237060547
1466
+ ],
1467
+ [
1468
+ "▁{",
1469
+ -6.222105979919434
1470
+ ],
1471
+ [
1472
+ "▁mo",
1473
+ -6.245700359344482
1474
+ ],
1475
+ [
1476
+ ".\"",
1477
+ -6.260274410247803
1478
+ ],
1479
+ [
1480
+ "ver",
1481
+ -6.260942459106445
1482
+ ],
1483
+ [
1484
+ "▁ma",
1485
+ -6.267685890197754
1486
+ ],
1487
+ [
1488
+ "▁she",
1489
+ -6.298584938049316
1490
+ ],
1491
+ [
1492
+ "▁con",
1493
+ -6.302234649658203
1494
+ ],
1495
+ [
1496
+ "▁have",
1497
+ -6.305024147033691
1498
+ ],
1499
+ [
1500
+ "▁no",
1501
+ -6.305781364440918
1502
+ ],
1503
+ [
1504
+ "I",
1505
+ -6.324380397796631
1506
+ ],
1507
+ [
1508
+ "▁him",
1509
+ -6.327845573425293
1510
+ ],
1511
+ [
1512
+ "H",
1513
+ -6.332118034362793
1514
+ ],
1515
+ [
1516
+ "el",
1517
+ -6.335691928863525
1518
+ ],
1519
+ [
1520
+ "ll",
1521
+ -6.3703484535217285
1522
+ ],
1523
+ [
1524
+ "ation",
1525
+ -6.3807830810546875
1526
+ ],
1527
+ [
1528
+ "▁fa",
1529
+ -6.381678581237793
1530
+ ],
1531
+ [
1532
+ "▁th",
1533
+ -6.384618282318115
1534
+ ],
1535
+ [
1536
+ "▁su",
1537
+ -6.388561248779297
1538
+ ],
1539
+ [
1540
+ "▁but",
1541
+ -6.402822971343994
1542
+ ],
1543
+ [
1544
+ "lo",
1545
+ -6.411969184875488
1546
+ ],
1547
+ [
1548
+ "li",
1549
+ -6.413236141204834
1550
+ ],
1551
+ [
1552
+ "ther",
1553
+ -6.416365623474121
1554
+ ],
1555
+ [
1556
+ "▁by",
1557
+ -6.421297073364258
1558
+ ],
1559
+ [
1560
+ "▁C",
1561
+ -6.428677558898926
1562
+ ],
1563
+ [
1564
+ "▁which",
1565
+ -6.430390357971191
1566
+ ],
1567
+ [
1568
+ "▁all",
1569
+ -6.434648513793945
1570
+ ],
1571
+ [
1572
+ "id",
1573
+ -6.451933860778809
1574
+ ],
1575
+ [
1576
+ "▁se",
1577
+ -6.453425884246826
1578
+ ],
1579
+ [
1580
+ "▁from",
1581
+ -6.4639363288879395
1582
+ ],
1583
+ [
1584
+ "▁la",
1585
+ -6.464136123657227
1586
+ ],
1587
+ [
1588
+ "▁ex",
1589
+ -6.46722412109375
1590
+ ],
1591
+ [
1592
+ "▁or",
1593
+ -6.478571891784668
1594
+ ],
1595
+ [
1596
+ "▁B",
1597
+ -6.484975337982178
1598
+ ],
1599
+ [
1600
+ "▁are",
1601
+ -6.488126277923584
1602
+ ],
1603
+ [
1604
+ "▁M",
1605
+ -6.489441871643066
1606
+ ],
1607
+ [
1608
+ "▁He",
1609
+ -6.494318008422852
1610
+ ],
1611
+ [
1612
+ "R",
1613
+ -6.496219158172607
1614
+ ],
1615
+ [
1616
+ "▁my",
1617
+ -6.501786708831787
1618
+ ],
1619
+ [
1620
+ "ul",
1621
+ -6.503665447235107
1622
+ ],
1623
+ [
1624
+ "un",
1625
+ -6.507981300354004
1626
+ ],
1627
+ [
1628
+ "▁this",
1629
+ -6.509151458740234
1630
+ ],
1631
+ [
1632
+ "▁we",
1633
+ -6.509461402893066
1634
+ ],
1635
+ [
1636
+ "▁were",
1637
+ -6.515420913696289
1638
+ ],
1639
+ [
1640
+ ",\"",
1641
+ -6.516303539276123
1642
+ ],
1643
+ [
1644
+ "ck",
1645
+ -6.536041736602783
1646
+ ],
1647
+ [
1648
+ "▁who",
1649
+ -6.560507297515869
1650
+ ],
1651
+ [
1652
+ "▁sh",
1653
+ -6.560885429382324
1654
+ ],
1655
+ [
1656
+ "▁[",
1657
+ -6.566647529602051
1658
+ ],
1659
+ [
1660
+ "ow",
1661
+ -6.567778587341309
1662
+ ],
1663
+ [
1664
+ "▁said",
1665
+ -6.572078227996826
1666
+ ],
1667
+ [
1668
+ "▁P",
1669
+ -6.572185516357422
1670
+ ],
1671
+ [
1672
+ "D",
1673
+ -6.575502872467041
1674
+ ],
1675
+ [
1676
+ "et",
1677
+ -6.59080696105957
1678
+ ],
1679
+ [
1680
+ "ion",
1681
+ -6.602999687194824
1682
+ ],
1683
+ [
1684
+ "▁L",
1685
+ -6.604433536529541
1686
+ ],
1687
+ [
1688
+ "ant",
1689
+ -6.60852575302124
1690
+ ],
1691
+ [
1692
+ "ment",
1693
+ -6.609012603759766
1694
+ ],
1695
+ [
1696
+ "▁W",
1697
+ -6.60936975479126
1698
+ ],
1699
+ [
1700
+ "▁po",
1701
+ -6.6191205978393555
1702
+ ],
1703
+ [
1704
+ "am",
1705
+ -6.623525142669678
1706
+ ],
1707
+ [
1708
+ "vi",
1709
+ -6.627021789550781
1710
+ ],
1711
+ [
1712
+ "▁]",
1713
+ -6.630141258239746
1714
+ ],
1715
+ [
1716
+ "▁one",
1717
+ -6.639581203460693
1718
+ ],
1719
+ [
1720
+ "x",
1721
+ -6.648676872253418
1722
+ ],
1723
+ [
1724
+ "ct",
1725
+ -6.654047012329102
1726
+ ],
1727
+ [
1728
+ "▁an",
1729
+ -6.659359931945801
1730
+ ],
1731
+ [
1732
+ "period",
1733
+ -6.661555290222168
1734
+ ],
1735
+ [
1736
+ "us",
1737
+ -6.667537689208984
1738
+ ],
1739
+ [
1740
+ "pp",
1741
+ -6.67457389831543
1742
+ ],
1743
+ [
1744
+ "im",
1745
+ -6.682270526885986
1746
+ ],
1747
+ [
1748
+ "▁man",
1749
+ -6.685298442840576
1750
+ ],
1751
+ [
1752
+ "▁pro",
1753
+ -6.695033073425293
1754
+ ],
1755
+ [
1756
+ "ut",
1757
+ -6.69868803024292
1758
+ ],
1759
+ [
1760
+ "▁sp",
1761
+ -6.702606201171875
1762
+ ],
1763
+ [
1764
+ "▁ho",
1765
+ -6.7108588218688965
1766
+ ],
1767
+ [
1768
+ "▁le",
1769
+ -6.717572212219238
1770
+ ],
1771
+ [
1772
+ "▁ca",
1773
+ -6.721408367156982
1774
+ ],
1775
+ [
1776
+ "j",
1777
+ -6.72599983215332
1778
+ ],
1779
+ [
1780
+ "ough",
1781
+ -6.729132175445557
1782
+ ],
1783
+ [
1784
+ "▁go",
1785
+ -6.737214088439941
1786
+ ],
1787
+ [
1788
+ "ge",
1789
+ -6.741745471954346
1790
+ ],
1791
+ [
1792
+ "▁ha",
1793
+ -6.7515549659729
1794
+ ],
1795
+ [
1796
+ "▁F",
1797
+ -6.754391670227051
1798
+ ],
1799
+ [
1800
+ "▁mi",
1801
+ -6.762367248535156
1802
+ ],
1803
+ [
1804
+ "ound",
1805
+ -6.773167610168457
1806
+ ],
1807
+ [
1808
+ "▁they",
1809
+ -6.787041664123535
1810
+ ],
1811
+ [
1812
+ "▁would",
1813
+ -6.787123203277588
1814
+ ],
1815
+ [
1816
+ "hi",
1817
+ -6.79250955581665
1818
+ ],
1819
+ [
1820
+ "ke",
1821
+ -6.795114040374756
1822
+ ],
1823
+ [
1824
+ "ive",
1825
+ -6.7967634201049805
1826
+ ],
1827
+ [
1828
+ "ate",
1829
+ -6.804977893829346
1830
+ ],
1831
+ [
1832
+ "▁T",
1833
+ -6.80813455581665
1834
+ ],
1835
+ [
1836
+ "z",
1837
+ -6.812094211578369
1838
+ ],
1839
+ [
1840
+ "per",
1841
+ -6.8176069259643555
1842
+ ],
1843
+ [
1844
+ "▁sa",
1845
+ -6.819145679473877
1846
+ ],
1847
+ [
1848
+ "▁out",
1849
+ -6.826883316040039
1850
+ ],
1851
+ [
1852
+ "ol",
1853
+ -6.833821773529053
1854
+ ],
1855
+ [
1856
+ "▁up",
1857
+ -6.834559440612793
1858
+ ],
1859
+ [
1860
+ "co",
1861
+ -6.844675540924072
1862
+ ],
1863
+ [
1864
+ "▁pa",
1865
+ -6.855184078216553
1866
+ ],
1867
+ [
1868
+ "A",
1869
+ -6.858541965484619
1870
+ ],
1871
+ [
1872
+ "old",
1873
+ -6.860659599304199
1874
+ ],
1875
+ [
1876
+ "!",
1877
+ -6.876157760620117
1878
+ ],
1879
+ [
1880
+ "▁dis",
1881
+ -6.8776350021362305
1882
+ ],
1883
+ [
1884
+ "▁see",
1885
+ -6.881742000579834
1886
+ ],
1887
+ [
1888
+ "ry",
1889
+ -6.883233547210693
1890
+ ],
1891
+ [
1892
+ "ff",
1893
+ -6.894794940948486
1894
+ ],
1895
+ [
1896
+ "N",
1897
+ -6.899501800537109
1898
+ ],
1899
+ [
1900
+ "▁un",
1901
+ -6.9004364013671875
1902
+ ],
1903
+ [
1904
+ "▁co",
1905
+ -6.9032673835754395
1906
+ ],
1907
+ [
1908
+ "▁O",
1909
+ -6.907948017120361
1910
+ ],
1911
+ [
1912
+ "▁been",
1913
+ -6.916658878326416
1914
+ ],
1915
+ [
1916
+ "ity",
1917
+ -6.9181389808654785
1918
+ ],
1919
+ [
1920
+ "he",
1921
+ -6.921543121337891
1922
+ ],
1923
+ [
1924
+ "▁di",
1925
+ -6.923122882843018
1926
+ ],
1927
+ [
1928
+ "lu",
1929
+ -6.931999206542969
1930
+ ],
1931
+ [
1932
+ "▁there",
1933
+ -6.934426784515381
1934
+ ],
1935
+ [
1936
+ "▁their",
1937
+ -6.934463977813721
1938
+ ],
1939
+ [
1940
+ "der",
1941
+ -6.935543537139893
1942
+ ],
1943
+ [
1944
+ "est",
1945
+ -6.940905570983887
1946
+ ],
1947
+ [
1948
+ "E",
1949
+ -6.9548492431640625
1950
+ ],
1951
+ [
1952
+ "▁will",
1953
+ -6.958393096923828
1954
+ ],
1955
+ [
1956
+ "ight",
1957
+ -6.965392112731934
1958
+ ],
1959
+ [
1960
+ "_",
1961
+ -6.966725826263428
1962
+ ],
1963
+ [
1964
+ "mp",
1965
+ -6.967768669128418
1966
+ ],
1967
+ [
1968
+ "▁fi",
1969
+ -6.972559452056885
1970
+ ],
1971
+ [
1972
+ "ish",
1973
+ -6.979668617248535
1974
+ ],
1975
+ [
1976
+ "ance",
1977
+ -6.982175827026367
1978
+ ],
1979
+ [
1980
+ "ci",
1981
+ -6.984276294708252
1982
+ ],
1983
+ [
1984
+ "▁E",
1985
+ -6.989256381988525
1986
+ ],
1987
+ [
1988
+ "▁tr",
1989
+ -6.989500999450684
1990
+ ],
1991
+ [
1992
+ "▁G",
1993
+ -6.999267578125
1994
+ ],
1995
+ [
1996
+ "▁li",
1997
+ -7.00178861618042
1998
+ ],
1999
+ [
2000
+ "pe",
2001
+ -7.007822513580322
2002
+ ],
2003
+ [
2004
+ "▁bo",
2005
+ -7.010932922363281
2006
+ ],
2007
+ [
2008
+ "▁No",
2009
+ -7.014448642730713
2010
+ ],
2011
+ [
2012
+ "1",
2013
+ -7.014896392822266
2014
+ ],
2015
+ [
2016
+ "qu",
2017
+ -7.022171974182129
2018
+ ],
2019
+ [
2020
+ "ill",
2021
+ -7.023817539215088
2022
+ ],
2023
+ [
2024
+ "ard",
2025
+ -7.024126052856445
2026
+ ],
2027
+ [
2028
+ "?",
2029
+ -7.025038242340088
2030
+ ],
2031
+ [
2032
+ "able",
2033
+ -7.033109188079834
2034
+ ],
2035
+ [
2036
+ "▁when",
2037
+ -7.038402080535889
2038
+ ],
2039
+ [
2040
+ "ten",
2041
+ -7.03859806060791
2042
+ ],
2043
+ [
2044
+ "age",
2045
+ -7.041808605194092
2046
+ ],
2047
+ [
2048
+ "?\"",
2049
+ -7.047811508178711
2050
+ ],
2051
+ [
2052
+ "▁en",
2053
+ -7.050681114196777
2054
+ ],
2055
+ [
2056
+ "ous",
2057
+ -7.060575008392334
2058
+ ],
2059
+ [
2060
+ "tra",
2061
+ -7.063107490539551
2062
+ ],
2063
+ [
2064
+ "ence",
2065
+ -7.072290420532227
2066
+ ],
2067
+ [
2068
+ "ect",
2069
+ -7.075937271118164
2070
+ ],
2071
+ [
2072
+ "J",
2073
+ -7.077571868896484
2074
+ ],
2075
+ [
2076
+ "▁some",
2077
+ -7.081721782684326
2078
+ ],
2079
+ [
2080
+ "▁them",
2081
+ -7.1001973152160645
2082
+ ],
2083
+ [
2084
+ "▁ne",
2085
+ -7.104800701141357
2086
+ ],
2087
+ [
2088
+ "▁could",
2089
+ -7.107924938201904
2090
+ ],
2091
+ [
2092
+ "▁can",
2093
+ -7.113558292388916
2094
+ ],
2095
+ [
2096
+ "▁if",
2097
+ -7.1193528175354
2098
+ ],
2099
+ [
2100
+ "▁what",
2101
+ -7.127973556518555
2102
+ ],
2103
+ [
2104
+ "▁know",
2105
+ -7.130826473236084
2106
+ ],
2107
+ [
2108
+ "ful",
2109
+ -7.136860370635986
2110
+ ],
2111
+ [
2112
+ "O",
2113
+ -7.163638114929199
2114
+ ],
2115
+ [
2116
+ "ru",
2117
+ -7.164709091186523
2118
+ ],
2119
+ [
2120
+ "ell",
2121
+ -7.16968297958374
2122
+ ],
2123
+ [
2124
+ "▁sta",
2125
+ -7.1711201667785645
2126
+ ],
2127
+ [
2128
+ "▁time",
2129
+ -7.173023700714111
2130
+ ],
2131
+ [
2132
+ "▁any",
2133
+ -7.18028450012207
2134
+ ],
2135
+ [
2136
+ "▁ra",
2137
+ -7.186168193817139
2138
+ ],
2139
+ [
2140
+ "▁more",
2141
+ -7.186279296875
2142
+ ],
2143
+ [
2144
+ "▁into",
2145
+ -7.190565586090088
2146
+ ],
2147
+ [
2148
+ "ome",
2149
+ -7.193604469299316
2150
+ ],
2151
+ [
2152
+ "T",
2153
+ -7.196429252624512
2154
+ ],
2155
+ [
2156
+ "▁other",
2157
+ -7.204713344573975
2158
+ ],
2159
+ [
2160
+ ":",
2161
+ -7.209097862243652
2162
+ ],
2163
+ [
2164
+ "ies",
2165
+ -7.218936920166016
2166
+ ],
2167
+ [
2168
+ "▁your",
2169
+ -7.223606109619141
2170
+ ],
2171
+ [
2172
+ "▁And",
2173
+ -7.229339122772217
2174
+ ],
2175
+ [
2176
+ "▁ye",
2177
+ -7.232705116271973
2178
+ ],
2179
+ [
2180
+ "S",
2181
+ -7.234437942504883
2182
+ ],
2183
+ [
2184
+ "▁like",
2185
+ -7.235995769500732
2186
+ ],
2187
+ [
2188
+ "ness",
2189
+ -7.239752292633057
2190
+ ],
2191
+ [
2192
+ "▁dr",
2193
+ -7.25142765045166
2194
+ ],
2195
+ [
2196
+ "low",
2197
+ -7.2691826820373535
2198
+ ],
2199
+ [
2200
+ "▁It",
2201
+ -7.270310878753662
2202
+ ],
2203
+ [
2204
+ "▁sto",
2205
+ -7.2738118171691895
2206
+ ],
2207
+ [
2208
+ "▁us",
2209
+ -7.2769670486450195
2210
+ ],
2211
+ [
2212
+ "▁But",
2213
+ -7.279003620147705
2214
+ ],
2215
+ [
2216
+ "▁pre",
2217
+ -7.3001484870910645
2218
+ ],
2219
+ [
2220
+ "mb",
2221
+ -7.302000522613525
2222
+ ],
2223
+ [
2224
+ "side",
2225
+ -7.3103156089782715
2226
+ ],
2227
+ [
2228
+ "▁has",
2229
+ -7.317553520202637
2230
+ ],
2231
+ [
2232
+ "row",
2233
+ -7.317798137664795
2234
+ ],
2235
+ [
2236
+ "▁There",
2237
+ -7.321875095367432
2238
+ ],
2239
+ [
2240
+ "cu",
2241
+ -7.324193000793457
2242
+ ],
2243
+ [
2244
+ "▁very",
2245
+ -7.333028793334961
2246
+ ],
2247
+ [
2248
+ "▁than",
2249
+ -7.334202289581299
2250
+ ],
2251
+ [
2252
+ "▁lo",
2253
+ -7.334221363067627
2254
+ ],
2255
+ [
2256
+ "▁did",
2257
+ -7.338522911071777
2258
+ ],
2259
+ [
2260
+ "ach",
2261
+ -7.3409504890441895
2262
+ ],
2263
+ [
2264
+ "▁about",
2265
+ -7.341108322143555
2266
+ ],
2267
+ [
2268
+ "▁day",
2269
+ -7.3574066162109375
2270
+ ],
2271
+ [
2272
+ "▁over",
2273
+ -7.358343601226807
2274
+ ],
2275
+ [
2276
+ "▁look",
2277
+ -7.359866142272949
2278
+ ],
2279
+ [
2280
+ "-",
2281
+ -7.369490146636963
2282
+ ],
2283
+ [
2284
+ "tion",
2285
+ -7.372918605804443
2286
+ ],
2287
+ [
2288
+ "ture",
2289
+ -7.383158206939697
2290
+ ],
2291
+ [
2292
+ "▁Mr",
2293
+ -7.389575958251953
2294
+ ],
2295
+ [
2296
+ "ph",
2297
+ -7.394842147827148
2298
+ ],
2299
+ [
2300
+ "▁little",
2301
+ -7.395269393920898
2302
+ ],
2303
+ [
2304
+ "ho",
2305
+ -7.403622627258301
2306
+ ],
2307
+ [
2308
+ "▁again",
2309
+ -7.425952911376953
2310
+ ],
2311
+ [
2312
+ "ction",
2313
+ -7.430576801300049
2314
+ ],
2315
+ [
2316
+ "ig",
2317
+ -7.438838958740234
2318
+ ],
2319
+ [
2320
+ "▁hand",
2321
+ -7.44147253036499
2322
+ ],
2323
+ [
2324
+ "▁now",
2325
+ -7.445526123046875
2326
+ ],
2327
+ [
2328
+ "qui",
2329
+ -7.446432590484619
2330
+ ],
2331
+ [
2332
+ "▁sc",
2333
+ -7.450597763061523
2334
+ ],
2335
+ [
2336
+ "▁should",
2337
+ -7.453698635101318
2338
+ ],
2339
+ [
2340
+ "▁great",
2341
+ -7.463414669036865
2342
+ ],
2343
+ [
2344
+ "▁two",
2345
+ -7.470479965209961
2346
+ ],
2347
+ [
2348
+ "!\"",
2349
+ -7.471007347106934
2350
+ ],
2351
+ [
2352
+ "▁right",
2353
+ -7.472324371337891
2354
+ ],
2355
+ [
2356
+ "ious",
2357
+ -7.481609344482422
2358
+ ],
2359
+ [
2360
+ "man",
2361
+ -7.48414945602417
2362
+ ],
2363
+ [
2364
+ "—",
2365
+ -7.484794616699219
2366
+ ],
2367
+ [
2368
+ "▁our",
2369
+ -7.485799789428711
2370
+ ],
2371
+ [
2372
+ "You",
2373
+ -7.504610538482666
2374
+ ],
2375
+ [
2376
+ "▁say",
2377
+ -7.508391380310059
2378
+ ],
2379
+ [
2380
+ "▁upon",
2381
+ -7.5086517333984375
2382
+ ],
2383
+ [
2384
+ "▁hu",
2385
+ -7.517370223999023
2386
+ ],
2387
+ [
2388
+ "▁comp",
2389
+ -7.519069194793701
2390
+ ],
2391
+ [
2392
+ "ress",
2393
+ -7.519094467163086
2394
+ ],
2395
+ [
2396
+ "▁only",
2397
+ -7.524586200714111
2398
+ ],
2399
+ [
2400
+ "▁She",
2401
+ -7.52614164352417
2402
+ ],
2403
+ [
2404
+ "less",
2405
+ -7.530499458312988
2406
+ ],
2407
+ [
2408
+ "ated",
2409
+ -7.53152322769165
2410
+ ],
2411
+ [
2412
+ "2",
2413
+ -7.5379862785339355
2414
+ ],
2415
+ [
2416
+ "▁left",
2417
+ -7.551514625549316
2418
+ ],
2419
+ [
2420
+ "▁down",
2421
+ -7.5532917976379395
2422
+ ],
2423
+ [
2424
+ "▁ga",
2425
+ -7.558096885681152
2426
+ ],
2427
+ [
2428
+ "0",
2429
+ -7.565780162811279
2430
+ ],
2431
+ [
2432
+ "▁da",
2433
+ -7.566195487976074
2434
+ ],
2435
+ [
2436
+ "▁after",
2437
+ -7.572922706604004
2438
+ ],
2439
+ [
2440
+ "▁made",
2441
+ -7.585012912750244
2442
+ ],
2443
+ [
2444
+ "tain",
2445
+ -7.588871002197266
2446
+ ],
2447
+ [
2448
+ "ick",
2449
+ -7.591251850128174
2450
+ ],
2451
+ [
2452
+ "new",
2453
+ -7.592978000640869
2454
+ ],
2455
+ [
2456
+ "▁com",
2457
+ -7.594455718994141
2458
+ ],
2459
+ [
2460
+ "ving",
2461
+ -7.599806308746338
2462
+ ],
2463
+ [
2464
+ "▁Ma",
2465
+ -7.607027053833008
2466
+ ],
2467
+ [
2468
+ "ward",
2469
+ -7.620022296905518
2470
+ ],
2471
+ [
2472
+ "▁before",
2473
+ -7.6242265701293945
2474
+ ],
2475
+ [
2476
+ "W",
2477
+ -7.633222579956055
2478
+ ],
2479
+ [
2480
+ "U",
2481
+ -7.636125087738037
2482
+ ],
2483
+ [
2484
+ "ven",
2485
+ -7.636752605438232
2486
+ ],
2487
+ [
2488
+ "▁good",
2489
+ -7.638261795043945
2490
+ ],
2491
+ [
2492
+ "ign",
2493
+ -7.644953727722168
2494
+ ],
2495
+ [
2496
+ "tru",
2497
+ -7.648081302642822
2498
+ ],
2499
+ [
2500
+ "▁cha",
2501
+ -7.657571315765381
2502
+ ],
2503
+ [
2504
+ "▁long",
2505
+ -7.658087253570557
2506
+ ],
2507
+ [
2508
+ "▁how",
2509
+ -7.661345958709717
2510
+ ],
2511
+ [
2512
+ "Y",
2513
+ -7.661816120147705
2514
+ ],
2515
+ [
2516
+ "paragraph",
2517
+ -7.664102554321289
2518
+ ],
2519
+ [
2520
+ "▁come",
2521
+ -7.672017574310303
2522
+ ],
2523
+ [
2524
+ "▁part",
2525
+ -7.674316883087158
2526
+ ],
2527
+ [
2528
+ "V",
2529
+ -7.675340175628662
2530
+ ],
2531
+ [
2532
+ "▁ob",
2533
+ -7.680867671966553
2534
+ ],
2535
+ [
2536
+ "▁thing",
2537
+ -7.68093729019165
2538
+ ],
2539
+ [
2540
+ "▁must",
2541
+ -7.683284759521484
2542
+ ],
2543
+ [
2544
+ "ha",
2545
+ -7.684486389160156
2546
+ ],
2547
+ [
2548
+ "▁even",
2549
+ -7.699027061462402
2550
+ ],
2551
+ [
2552
+ "▁way",
2553
+ -7.6992669105529785
2554
+ ],
2555
+ [
2556
+ "▁take",
2557
+ -7.7000508308410645
2558
+ ],
2559
+ [
2560
+ "K",
2561
+ -7.707818984985352
2562
+ ],
2563
+ [
2564
+ "▁back",
2565
+ -7.712675094604492
2566
+ ],
2567
+ [
2568
+ "▁under",
2569
+ -7.713348865509033
2570
+ ],
2571
+ [
2572
+ "▁came",
2573
+ -7.716226577758789
2574
+ ],
2575
+ [
2576
+ "L",
2577
+ -7.728263854980469
2578
+ ],
2579
+ [
2580
+ "▁well",
2581
+ -7.731564521789551
2582
+ ],
2583
+ [
2584
+ "▁think",
2585
+ -7.732664108276367
2586
+ ],
2587
+ [
2588
+ "▁never",
2589
+ -7.760746479034424
2590
+ ],
2591
+ [
2592
+ "▁much",
2593
+ -7.763637542724609
2594
+ ],
2595
+ [
2596
+ "▁gra",
2597
+ -7.766099452972412
2598
+ ],
2599
+ [
2600
+ "▁first",
2601
+ -7.774232387542725
2602
+ ],
2603
+ [
2604
+ "▁every",
2605
+ -7.776756286621094
2606
+ ],
2607
+ [
2608
+ "ugh",
2609
+ -7.790675640106201
2610
+ ],
2611
+ [
2612
+ "▁such",
2613
+ -7.791811466217041
2614
+ ],
2615
+ [
2616
+ "▁where",
2617
+ -7.792065620422363
2618
+ ],
2619
+ [
2620
+ "land",
2621
+ -7.801862716674805
2622
+ ],
2623
+ [
2624
+ "▁Ch",
2625
+ -7.81598424911499
2626
+ ],
2627
+ [
2628
+ "▁imp",
2629
+ -7.82216739654541
2630
+ ],
2631
+ [
2632
+ "▁through",
2633
+ -7.82563591003418
2634
+ ],
2635
+ [
2636
+ "▁own",
2637
+ -7.836172103881836
2638
+ ],
2639
+ [
2640
+ "M",
2641
+ -7.846076011657715
2642
+ ],
2643
+ [
2644
+ "▁make",
2645
+ -7.847618579864502
2646
+ ],
2647
+ [
2648
+ "ook",
2649
+ -7.852067947387695
2650
+ ],
2651
+ [
2652
+ "3",
2653
+ -7.856833457946777
2654
+ ],
2655
+ [
2656
+ "use",
2657
+ -7.864173412322998
2658
+ ],
2659
+ [
2660
+ "C",
2661
+ -7.868447780609131
2662
+ ],
2663
+ [
2664
+ "▁place",
2665
+ -7.8706769943237305
2666
+ ],
2667
+ [
2668
+ "ition",
2669
+ -7.872597694396973
2670
+ ],
2671
+ [
2672
+ "serv",
2673
+ -7.8729166984558105
2674
+ ],
2675
+ [
2676
+ "pri",
2677
+ -7.888969421386719
2678
+ ],
2679
+ [
2680
+ "▁Th",
2681
+ -7.891221046447754
2682
+ ],
2683
+ [
2684
+ "▁give",
2685
+ -7.896949291229248
2686
+ ],
2687
+ [
2688
+ "▁just",
2689
+ -7.907407283782959
2690
+ ],
2691
+ [
2692
+ "5",
2693
+ -7.91018009185791
2694
+ ],
2695
+ [
2696
+ "ible",
2697
+ -7.912111759185791
2698
+ ],
2699
+ [
2700
+ "▁himself",
2701
+ -7.912140846252441
2702
+ ],
2703
+ [
2704
+ "▁might",
2705
+ -7.9202799797058105
2706
+ ],
2707
+ [
2708
+ "4",
2709
+ -7.932826995849609
2710
+ ],
2711
+ [
2712
+ "▁sw",
2713
+ -7.935232639312744
2714
+ ],
2715
+ [
2716
+ "▁life",
2717
+ -7.939643859863281
2718
+ ],
2719
+ [
2720
+ "▁without",
2721
+ -7.945864200592041
2722
+ ],
2723
+ [
2724
+ "▁get",
2725
+ -7.974736213684082
2726
+ ],
2727
+ [
2728
+ "▁work",
2729
+ -7.981810092926025
2730
+ ],
2731
+ [
2732
+ "▁du",
2733
+ -7.989681243896484
2734
+ ],
2735
+ [
2736
+ "▁pass",
2737
+ -7.998128414154053
2738
+ ],
2739
+ [
2740
+ "▁appear",
2741
+ -7.999986171722412
2742
+ ],
2743
+ [
2744
+ "▁house",
2745
+ -8.005838394165039
2746
+ ],
2747
+ [
2748
+ "What",
2749
+ -8.007867813110352
2750
+ ],
2751
+ [
2752
+ "▁away",
2753
+ -8.007869720458984
2754
+ ],
2755
+ [
2756
+ "▁love",
2757
+ -8.025405883789062
2758
+ ],
2759
+ [
2760
+ "▁call",
2761
+ -8.0332670211792
2762
+ ],
2763
+ [
2764
+ "line",
2765
+ -8.03477668762207
2766
+ ],
2767
+ [
2768
+ "▁turn",
2769
+ -8.039846420288086
2770
+ ],
2771
+ [
2772
+ "▁shall",
2773
+ -8.04392147064209
2774
+ ],
2775
+ [
2776
+ "▁This",
2777
+ -8.048504829406738
2778
+ ],
2779
+ [
2780
+ "▁mu",
2781
+ -8.054326057434082
2782
+ ],
2783
+ [
2784
+ "\"",
2785
+ -8.055206298828125
2786
+ ],
2787
+ [
2788
+ "▁those",
2789
+ -8.07278060913086
2790
+ ],
2791
+ [
2792
+ "▁night",
2793
+ -8.08027458190918
2794
+ ],
2795
+ [
2796
+ "The",
2797
+ -8.095630645751953
2798
+ ],
2799
+ [
2800
+ "port",
2801
+ -8.105117797851562
2802
+ ],
2803
+ [
2804
+ "▁word",
2805
+ -8.1146240234375
2806
+ ],
2807
+ [
2808
+ "B",
2809
+ -8.122001647949219
2810
+ ],
2811
+ [
2812
+ "P",
2813
+ -8.130742073059082
2814
+ ],
2815
+ [
2816
+ "*",
2817
+ -8.130934715270996
2818
+ ],
2819
+ [
2820
+ "▁heart",
2821
+ -8.131942749023438
2822
+ ],
2823
+ [
2824
+ "▁still",
2825
+ -8.140876770019531
2826
+ ],
2827
+ [
2828
+ "▁next",
2829
+ -8.142328262329102
2830
+ ],
2831
+ [
2832
+ "▁tell",
2833
+ -8.165434837341309
2834
+ ],
2835
+ [
2836
+ "▁room",
2837
+ -8.168549537658691
2838
+ ],
2839
+ [
2840
+ "8",
2841
+ -8.179108619689941
2842
+ ],
2843
+ [
2844
+ "6",
2845
+ -8.192533493041992
2846
+ ],
2847
+ [
2848
+ "▁three",
2849
+ -8.194826126098633
2850
+ ],
2851
+ [
2852
+ "▁friend",
2853
+ -8.195451736450195
2854
+ ],
2855
+ [
2856
+ "▁people",
2857
+ -8.210062026977539
2858
+ ],
2859
+ [
2860
+ ")",
2861
+ -8.219422340393066
2862
+ ],
2863
+ [
2864
+ "▁same",
2865
+ -8.228056907653809
2866
+ ],
2867
+ [
2868
+ "▁change",
2869
+ -8.232505798339844
2870
+ ],
2871
+ [
2872
+ "▁Do",
2873
+ -8.238903045654297
2874
+ ],
2875
+ [
2876
+ "▁follow",
2877
+ -8.242085456848145
2878
+ ],
2879
+ [
2880
+ "▁while",
2881
+ -8.25542163848877
2882
+ ],
2883
+ [
2884
+ "▁nothing",
2885
+ -8.257135391235352
2886
+ ],
2887
+ [
2888
+ "▁child",
2889
+ -8.258654594421387
2890
+ ],
2891
+ [
2892
+ "7",
2893
+ -8.28925895690918
2894
+ ],
2895
+ [
2896
+ "body",
2897
+ -8.293252944946289
2898
+ ],
2899
+ [
2900
+ "▁Ne",
2901
+ -8.295580863952637
2902
+ ],
2903
+ [
2904
+ "That",
2905
+ -8.300972938537598
2906
+ ],
2907
+ [
2908
+ "comma",
2909
+ -8.305874824523926
2910
+ ],
2911
+ [
2912
+ "▁light",
2913
+ -8.315401077270508
2914
+ ],
2915
+ [
2916
+ "▁another",
2917
+ -8.327678680419922
2918
+ ],
2919
+ [
2920
+ "▁God",
2921
+ -8.32773208618164
2922
+ ],
2923
+ [
2924
+ "▁name",
2925
+ -8.329839706420898
2926
+ ],
2927
+ [
2928
+ "▁asked",
2929
+ -8.341205596923828
2930
+ ],
2931
+ [
2932
+ "▁small",
2933
+ -8.341238975524902
2934
+ ],
2935
+ [
2936
+ "colon",
2937
+ -8.343833923339844
2938
+ ],
2939
+ [
2940
+ "▁open",
2941
+ -8.344965934753418
2942
+ ],
2943
+ [
2944
+ "G",
2945
+ -8.348701477050781
2946
+ ],
2947
+ [
2948
+ "▁IMPRESSION",
2949
+ -8.353904724121094
2950
+ ],
2951
+ [
2952
+ "▁normal",
2953
+ -8.35440444946289
2954
+ ],
2955
+ [
2956
+ "▁present",
2957
+ -8.373336791992188
2958
+ ],
2959
+ [
2960
+ "▁When",
2961
+ -8.376485824584961
2962
+ ],
2963
+ [
2964
+ "ific",
2965
+ -8.377349853515625
2966
+ ],
2967
+ [
2968
+ "▁world",
2969
+ -8.383152961730957
2970
+ ],
2971
+ [
2972
+ "▁answer",
2973
+ -8.385011672973633
2974
+ ],
2975
+ [
2976
+ "▁also",
2977
+ -8.393239974975586
2978
+ ],
2979
+ [
2980
+ "▁view",
2981
+ -8.396413803100586
2982
+ ],
2983
+ [
2984
+ "▁seemed",
2985
+ -8.415339469909668
2986
+ ],
2987
+ [
2988
+ "spect",
2989
+ -8.4177827835083
2990
+ ],
2991
+ [
2992
+ "▁always",
2993
+ -8.420531272888184
2994
+ ],
2995
+ [
2996
+ "F",
2997
+ -8.425969123840332
2998
+ ],
2999
+ [
3000
+ "9",
3001
+ -8.447197914123535
3002
+ ],
3003
+ [
3004
+ "▁knew",
3005
+ -8.451355934143066
3006
+ ],
3007
+ [
3008
+ "▁feel",
3009
+ -8.478418350219727
3010
+ ],
3011
+ [
3012
+ "▁girl",
3013
+ -8.489920616149902
3014
+ ],
3015
+ [
3016
+ "▁woman",
3017
+ -8.507888793945312
3018
+ ],
3019
+ [
3020
+ "▁something",
3021
+ -8.509052276611328
3022
+ ],
3023
+ [
3024
+ "▁return",
3025
+ -8.517931938171387
3026
+ ],
3027
+ [
3028
+ "▁kind",
3029
+ -8.522177696228027
3030
+ ],
3031
+ [
3032
+ "▁high",
3033
+ -8.52599048614502
3034
+ ],
3035
+ [
3036
+ "▁because",
3037
+ -8.552063941955566
3038
+ ],
3039
+ [
3040
+ "]",
3041
+ -8.968759536743164
3042
+ ],
3043
+ [
3044
+ "Q",
3045
+ -9.035648345947266
3046
+ ],
3047
+ [
3048
+ "X",
3049
+ -9.060090065002441
3050
+ ],
3051
+ [
3052
+ "/",
3053
+ -9.441831588745117
3054
+ ],
3055
+ [
3056
+ "Z",
3057
+ -9.746284484863281
3058
+ ],
3059
+ [
3060
+ "[",
3061
+ -10.22707748413086
3062
+ ],
3063
+ [
3064
+ "%",
3065
+ -10.805520057678223
3066
+ ],
3067
+ [
3068
+ "q",
3069
+ -11.41718578338623
3070
+ ],
3071
+ [
3072
+ "+",
3073
+ -12.179898262023926
3074
+ ],
3075
+ [
3076
+ "{",
3077
+ -12.902261734008789
3078
+ ]
3079
+ ],
3080
+ "byte_fallback": false
3081
+ }
3082
+ }
processor/tokenizer_config.json ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "__type": "AddedToken",
5
+ "content": "<epsilon>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ {
13
+ "__type": "AddedToken",
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ {
22
+ "__type": "AddedToken",
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "<extra_id_0>",
31
+ "<extra_id_1>",
32
+ "<extra_id_2>",
33
+ "<extra_id_3>",
34
+ "<extra_id_4>",
35
+ "<extra_id_5>",
36
+ "<extra_id_6>",
37
+ "<extra_id_7>",
38
+ "<extra_id_8>",
39
+ "<extra_id_9>",
40
+ "<extra_id_10>",
41
+ "<extra_id_11>",
42
+ "<extra_id_12>",
43
+ "<extra_id_13>",
44
+ "<extra_id_14>",
45
+ "<extra_id_15>",
46
+ "<extra_id_16>",
47
+ "<extra_id_17>",
48
+ "<extra_id_18>",
49
+ "<extra_id_19>",
50
+ "<extra_id_20>",
51
+ "<extra_id_21>",
52
+ "<extra_id_22>",
53
+ "<extra_id_23>",
54
+ "<extra_id_24>",
55
+ "<extra_id_25>",
56
+ "<extra_id_26>",
57
+ "<extra_id_27>",
58
+ "<extra_id_28>",
59
+ "<extra_id_29>",
60
+ "<extra_id_30>",
61
+ "<extra_id_31>",
62
+ "<extra_id_32>",
63
+ "<extra_id_33>",
64
+ "<extra_id_34>",
65
+ "<extra_id_35>",
66
+ "<extra_id_36>",
67
+ "<extra_id_37>",
68
+ "<extra_id_38>",
69
+ "<extra_id_39>",
70
+ "<extra_id_40>",
71
+ "<extra_id_41>",
72
+ "<extra_id_42>",
73
+ "<extra_id_43>",
74
+ "<extra_id_44>",
75
+ "<extra_id_45>",
76
+ "<extra_id_46>",
77
+ "<extra_id_47>",
78
+ "<extra_id_48>",
79
+ "<extra_id_49>",
80
+ "<extra_id_50>",
81
+ "<extra_id_51>",
82
+ "<extra_id_52>",
83
+ "<extra_id_53>",
84
+ "<extra_id_54>",
85
+ "<extra_id_55>",
86
+ "<extra_id_56>",
87
+ "<extra_id_57>",
88
+ "<extra_id_58>",
89
+ "<extra_id_59>",
90
+ "<extra_id_60>",
91
+ "<extra_id_61>",
92
+ "<extra_id_62>",
93
+ "<extra_id_63>",
94
+ "<extra_id_64>",
95
+ "<extra_id_65>",
96
+ "<extra_id_66>",
97
+ "<extra_id_67>",
98
+ "<extra_id_68>",
99
+ "<extra_id_69>",
100
+ "<extra_id_70>",
101
+ "<extra_id_71>",
102
+ "<extra_id_72>",
103
+ "<extra_id_73>",
104
+ "<extra_id_74>",
105
+ "<extra_id_75>",
106
+ "<extra_id_76>",
107
+ "<extra_id_77>",
108
+ "<extra_id_78>",
109
+ "<extra_id_79>",
110
+ "<extra_id_80>",
111
+ "<extra_id_81>",
112
+ "<extra_id_82>",
113
+ "<extra_id_83>",
114
+ "<extra_id_84>",
115
+ "<extra_id_85>",
116
+ "<extra_id_86>",
117
+ "<extra_id_87>",
118
+ "<extra_id_88>",
119
+ "<extra_id_89>",
120
+ "<extra_id_90>",
121
+ "<extra_id_91>",
122
+ "<extra_id_92>",
123
+ "<extra_id_93>",
124
+ "<extra_id_94>",
125
+ "<extra_id_95>",
126
+ "<extra_id_96>",
127
+ "<extra_id_97>",
128
+ "<extra_id_98>",
129
+ "<extra_id_99>"
130
+ ],
131
+ "backend": "tokenizers",
132
+ "eos_token": "</s>",
133
+ "extra_ids": 100,
134
+ "extra_special_tokens": [
135
+ "<epsilon>",
136
+ "<s>",
137
+ "</s>",
138
+ "<extra_id_0>",
139
+ "<extra_id_1>",
140
+ "<extra_id_2>",
141
+ "<extra_id_3>",
142
+ "<extra_id_4>",
143
+ "<extra_id_5>",
144
+ "<extra_id_6>",
145
+ "<extra_id_7>",
146
+ "<extra_id_8>",
147
+ "<extra_id_9>",
148
+ "<extra_id_10>",
149
+ "<extra_id_11>",
150
+ "<extra_id_12>",
151
+ "<extra_id_13>",
152
+ "<extra_id_14>",
153
+ "<extra_id_15>",
154
+ "<extra_id_16>",
155
+ "<extra_id_17>",
156
+ "<extra_id_18>",
157
+ "<extra_id_19>",
158
+ "<extra_id_20>",
159
+ "<extra_id_21>",
160
+ "<extra_id_22>",
161
+ "<extra_id_23>",
162
+ "<extra_id_24>",
163
+ "<extra_id_25>",
164
+ "<extra_id_26>",
165
+ "<extra_id_27>",
166
+ "<extra_id_28>",
167
+ "<extra_id_29>",
168
+ "<extra_id_30>",
169
+ "<extra_id_31>",
170
+ "<extra_id_32>",
171
+ "<extra_id_33>",
172
+ "<extra_id_34>",
173
+ "<extra_id_35>",
174
+ "<extra_id_36>",
175
+ "<extra_id_37>",
176
+ "<extra_id_38>",
177
+ "<extra_id_39>",
178
+ "<extra_id_40>",
179
+ "<extra_id_41>",
180
+ "<extra_id_42>",
181
+ "<extra_id_43>",
182
+ "<extra_id_44>",
183
+ "<extra_id_45>",
184
+ "<extra_id_46>",
185
+ "<extra_id_47>",
186
+ "<extra_id_48>",
187
+ "<extra_id_49>",
188
+ "<extra_id_50>",
189
+ "<extra_id_51>",
190
+ "<extra_id_52>",
191
+ "<extra_id_53>",
192
+ "<extra_id_54>",
193
+ "<extra_id_55>",
194
+ "<extra_id_56>",
195
+ "<extra_id_57>",
196
+ "<extra_id_58>",
197
+ "<extra_id_59>",
198
+ "<extra_id_60>",
199
+ "<extra_id_61>",
200
+ "<extra_id_62>",
201
+ "<extra_id_63>",
202
+ "<extra_id_64>",
203
+ "<extra_id_65>",
204
+ "<extra_id_66>",
205
+ "<extra_id_67>",
206
+ "<extra_id_68>",
207
+ "<extra_id_69>",
208
+ "<extra_id_70>",
209
+ "<extra_id_71>",
210
+ "<extra_id_72>",
211
+ "<extra_id_73>",
212
+ "<extra_id_74>",
213
+ "<extra_id_75>",
214
+ "<extra_id_76>",
215
+ "<extra_id_77>",
216
+ "<extra_id_78>",
217
+ "<extra_id_79>",
218
+ "<extra_id_80>",
219
+ "<extra_id_81>",
220
+ "<extra_id_82>",
221
+ "<extra_id_83>",
222
+ "<extra_id_84>",
223
+ "<extra_id_85>",
224
+ "<extra_id_86>",
225
+ "<extra_id_87>",
226
+ "<extra_id_88>",
227
+ "<extra_id_89>",
228
+ "<extra_id_90>",
229
+ "<extra_id_91>",
230
+ "<extra_id_92>",
231
+ "<extra_id_93>",
232
+ "<extra_id_94>",
233
+ "<extra_id_95>",
234
+ "<extra_id_96>",
235
+ "<extra_id_97>",
236
+ "<extra_id_98>",
237
+ "<extra_id_99>"
238
+ ],
239
+ "is_local": false,
240
+ "model_max_length": 1000000000000000019884624838656,
241
+ "pad_token": "<epsilon>",
242
+ "processor_class": "LasrProcessor",
243
+ "tokenizer_class": "LasrTokenizer",
244
+ "unk_id": 3,
245
+ "unk_token": "<unk>"
246
+ }
quantization_report.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "timestamp_utc": "2026-02-09T20:38:09.856840+00:00",
3
+ "source_model_dir": "artifacts/medasr-mlx-fp16",
4
+ "output_model_dir": "artifacts/medasr-mlx-int8",
5
+ "quantization": {
6
+ "bits": 8,
7
+ "group_size": 64,
8
+ "mode": "affine",
9
+ "target_modules": "mlx.nn.quantize default predicate (Linear/Embedding layers)"
10
+ },
11
+ "timing": {
12
+ "quantization_time_s": 0.0034
13
+ },
14
+ "size_mb": {
15
+ "source_weights": 200.9,
16
+ "output_weights": 121.108,
17
+ "compression_ratio_x": 1.659,
18
+ "reduction_percent": 39.72
19
+ }
20
+ }
weights.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7f32ad3faaf0c6eb957f30dabad89026a5fc0a42cebdef8647c00dfe2c51009
3
+ size 126991066