danielhanchen commited on
Commit
d872dbc
·
verified ·
1 Parent(s): 706b8e7

Add files using upload-large-folder tool

Browse files
Files changed (50) hide show
  1. .gitattributes +2 -0
  2. README.md +149 -0
  3. chat_template.jinja +86 -0
  4. config.json +782 -0
  5. generation_config.json +12 -0
  6. model-00001-of-00142.safetensors +3 -0
  7. model-00002-of-00142.safetensors +3 -0
  8. model-00003-of-00142.safetensors +3 -0
  9. model-00004-of-00142.safetensors +3 -0
  10. model-00005-of-00142.safetensors +3 -0
  11. model-00006-of-00142.safetensors +3 -0
  12. model-00007-of-00142.safetensors +3 -0
  13. model-00008-of-00142.safetensors +3 -0
  14. model-00009-of-00142.safetensors +3 -0
  15. model-00010-of-00142.safetensors +3 -0
  16. model-00011-of-00142.safetensors +3 -0
  17. model-00012-of-00142.safetensors +3 -0
  18. model-00013-of-00142.safetensors +3 -0
  19. model-00014-of-00142.safetensors +3 -0
  20. model-00015-of-00142.safetensors +3 -0
  21. model-00016-of-00142.safetensors +3 -0
  22. model-00017-of-00142.safetensors +3 -0
  23. model-00018-of-00142.safetensors +3 -0
  24. model-00019-of-00142.safetensors +3 -0
  25. model-00020-of-00142.safetensors +3 -0
  26. model-00021-of-00142.safetensors +3 -0
  27. model-00022-of-00142.safetensors +3 -0
  28. model-00023-of-00142.safetensors +3 -0
  29. model-00024-of-00142.safetensors +3 -0
  30. model-00025-of-00142.safetensors +3 -0
  31. model-00026-of-00142.safetensors +3 -0
  32. model-00027-of-00142.safetensors +3 -0
  33. model-00028-of-00142.safetensors +3 -0
  34. model-00029-of-00142.safetensors +3 -0
  35. model-00030-of-00142.safetensors +3 -0
  36. model-00031-of-00142.safetensors +3 -0
  37. model-00032-of-00142.safetensors +3 -0
  38. model-00033-of-00142.safetensors +3 -0
  39. model-00034-of-00142.safetensors +3 -0
  40. model-00035-of-00142.safetensors +3 -0
  41. model-00036-of-00142.safetensors +3 -0
  42. model-00037-of-00142.safetensors +3 -0
  43. model-00038-of-00142.safetensors +3 -0
  44. model-00039-of-00142.safetensors +3 -0
  45. model-00040-of-00142.safetensors +3 -0
  46. model-00041-of-00142.safetensors +3 -0
  47. model-00042-of-00142.safetensors +3 -0
  48. model-00043-of-00142.safetensors +3 -0
  49. model-00044-of-00142.safetensors +3 -0
  50. tokenizer_config.json +33 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - zh
5
+ library_name: transformers
6
+ license: mit
7
+ pipeline_tag: text-generation
8
+ ---
9
+
10
+ # GLM-5-FP8
11
+
12
+ <div align="center">
13
+ <img src=https://raw.githubusercontent.com/zai-org/GLM-5/refs/heads/main/resources/logo.svg width="15%"/>
14
+ </div>
15
+ <p align="center">
16
+ 👋 Join our <a href="https://raw.githubusercontent.com/zai-org/GLM-5/refs/heads/main/resources/wechat.png" target="_blank">WeChat</a> or <a href="https://discord.gg/QR7SARHRxK" target="_blank">Discord</a> community.
17
+ <br>
18
+ 📖 Check out the GLM-5 <a href="https://z.ai/blog/glm-5" target="_blank">technical blog</a>.
19
+ <br>
20
+ 📍 Use GLM-5 API services on <a href="https://docs.z.ai/guides/llm/glm-5">Z.ai API Platform. </a>
21
+ <br>
22
+ 👉 One click to <a href="https://chat.z.ai">GLM-5</a>.
23
+ </p>
24
+
25
+ ## Introduction
26
+
27
+ We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
28
+
29
+ Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed [slime](https://github.com/THUDM/slime), a novel **asynchronous RL infrastructure** that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.
30
+
31
+ ## Benchmark
32
+
33
+ | | GLM-5 | GLM-4.7 | DeepSeek-V3.2 | Kimi K2.5 | Claude Opus 4.5 | Gemini 3 Pro | GPT-5.2 (xhigh) |
34
+ | -------------------------------- | ---------------------- | --------- | ------------- |-----------| --------------- | ------------ | --------------- |
35
+ | HLE | 30.5 | 24.8 | 25.1 | 31.5 | 28.4 | 37.2 | 35.4 |
36
+ | HLE (w/ Tools) | 50.4 | 42.8 | 40.8 | 51.8 | 43.4* | 45.8* | 45.5* |
37
+ | AIME 2026 I | 92.7 | 92.9 | 92.7 | 92.5 | 93.3 | 90.6 | - |
38
+ | HMMT Nov. 2025 | 96.9 | 93.5 | 90.2 | 91.1 | 91.7 | 93.0 | 97.1 |
39
+ | IMOAnswerBench | 82.5 | 82.0 | 78.3 | 81.8 | 78.5 | 83.3 | 86.3 |
40
+ | GPQA-Diamond | 86.0 | 85.7 | 82.4 | 87.6 | 87.0 | 91.9 | 92.4 |
41
+ | SWE-bench Verified | 77.8 | 73.8 | 73.1 | 76.8 | 80.9 | 76.2 | 80.0 |
42
+ | SWE-bench Multilingual | 73.3 | 66.7 | 70.2 | 73.0 | 77.5 | 65.0 | 72.0 |
43
+ | Terminal-Bench 2.0 (Terminus 2) | 56.2 / 60.7 † | 41.0 | 39.3 | 50.8 | 59.3 | 54.2 | 54.0 |
44
+ | Terminal-Bench 2.0 (Claude Code) | 56.2 / 61.1 † | 32.8 | 46.4 | - | 57.9 | - | - |
45
+ | CyberGym | 43.2 | 23.5 | 17.3 | 41.3 | 50.6 | 39.9 | - |
46
+ | BrowseComp | 62.0 | 52.0 | 51.4 | 60.6 | 37.0 | 37.8 | - |
47
+ | BrowseComp (w/ Context Manage) | 75.9 | 67.5 | 67.6 | 74.9 | 67.8 | 59.2 | 65.8 |
48
+ | BrowseComp-Zh | 72.7 | 66.6 | 65.0 | 62.3 | 62.4 | 66.8 | 76.1 |
49
+ | τ²-Bench | 89.7 | 87.4 | 85.3 | 80.2 | 91.6 | 90.7 | 85.5 |
50
+ | MCP-Atlas (Public Set) | 67.8 | 52.0 | 62.2 | 63.8 | 65.2 | 66.6 | 68.0 |
51
+ | Tool-Decathlon | 38.0 | 23.8 | 35.2 | 27.8 | 43.5 | 36.4 | 46.3 |
52
+ | Vending Bench 2 | $4,432.12 | $2,376.82 | $1,034.00 | $1,198.46 | $4,967.06 | $5,478.16 | $3,591.33 |
53
+
54
+ > *: refers to their scores of full set.
55
+ >
56
+ > †: A verified version of Terminal-Bench 2.0 that fixes some ambiguous instructions.
57
+ See footnote for more evaluation details.
58
+
59
+ ### Footnote
60
+
61
+ * **Humanity’s Last Exam (HLE) & other reasoning tasks**: We evaluate with a maximum generation length of 131,072 tokens (`temperature=1.0, top_p=0.95, max_new_tokens=131072`). By default, we report the text-only subset; results marked with * are from the full set. We use GPT-5.2 (medium) as the judge model. For HLE-with-tools, we use a maximum context length of 202,752 tokens.
62
+ * **SWE-bench & SWE-bench Multilingual**: We run the SWE-bench suite with OpenHands using a tailored instruction prompt. Settings: `temperature=0.7, top_p=0.95, max_new_tokens=16384`, with a 200K context window.
63
+ * **BrowserComp**: Without context management, we retain details from the most recent 5 turns. With context management, we use the same discard-all strategy as DeepSeek-v3.2 and Kimi K2.5.
64
+ * **Terminal-Bench 2.0 (Terminus 2)**: We evaluate with the Terminus framework using `timeout=2h, temperature=0.7, top_p=1.0, max_new_tokens=8192`, with a 128K context window. Resource limits are capped at 16 CPUs and 32 GB RAM.
65
+ * **Terminal-Bench 2.0 (Claude Code)**: We evaluate in Claude Code 2.1.14 (think mode, default effort) with `temperature=1.0, top_p=0.95, max_new_tokens=65536`. We remove wall-clock time limits due to generation speed, while preserving per-task CPU and memory constraints. Scores are averaged over 5 runs. We fix environment issues introduced by Claude Code and also report results on a verified Terminal-Bench 2.0 dataset that resolves ambiguous instructions (see: [https://huggingface.co/datasets/zai-org/terminal-bench-2-verified](https://huggingface.co/datasets/zai-org/terminal-bench-2-verified)).
66
+ * **CyberGym**: We evaluate in Claude Code 2.1.18 (think mode, no web tools) with (`temperature=1.0, top_p=1.0, max_new_tokens=32000`) and a 250-minute timeout per task. Results are single-run Pass@1 over 1,507 tasks.
67
+ * **MCP-Atlas**: All models are evaluated in think mode on the 500-task public subset with a 10-minute timeout per task. We use Gemini 3 Pro as the judge model.
68
+ * **τ²-bench**: We add a small prompt adjustment in Retail and Telecom to avoid failures caused by premature user termination. For Airline, we apply the domain fixes proposed in the Claude Opus 4.5 system card.
69
+ * **Vending Bench 2**: Runs are conducted independently by [Andon Labs](https://andonlabs.com/evals/vending-bench-2).
70
+
71
+
72
+ ## Serve GLM-5 Locally
73
+
74
+ ### Prepare environment
75
+
76
+ vLLM, SGLang, and xLLM all support local deployment of GLM-5. A simple deployment guide is provided here.
77
+
78
+ + vLLM
79
+
80
+ Using Docker as:
81
+
82
+ ```shell
83
+ docker pull vllm/vllm-openai:nightly
84
+ ```
85
+
86
+ or using pip:
87
+
88
+ ```shell
89
+ pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
90
+ ```
91
+
92
+ then upgrade transformers:
93
+
94
+ ```
95
+ pip install git+https://github.com/huggingface/transformers.git
96
+ ```
97
+
98
+ + SGLang
99
+
100
+ Using Docker as:
101
+ ```bash
102
+ docker pull lmsysorg/sglang:glm5-hopper # For Hopper GPU
103
+ docker pull lmsysorg/sglang:glm5-blackwell # For Blackwell GPU
104
+ ```
105
+
106
+ ### Deploy
107
+
108
+ + vLLM
109
+
110
+ ```shell
111
+ vllm serve zai-org/GLM-5-FP8 \
112
+ --tensor-parallel-size 8 \
113
+ --gpu-memory-utilization 0.85 \
114
+ --speculative-config.method mtp \
115
+ --speculative-config.num_speculative_tokens 1 \
116
+ --tool-call-parser glm47 \
117
+ --reasoning-parser glm45 \
118
+ --enable-auto-tool-choice \
119
+ --served-model-name glm-5-fp8
120
+ ```
121
+
122
+ Check the [recipes](https://github.com/vllm-project/recipes/blob/main/GLM/GLM5.md) for more details.
123
+
124
+ + SGLang
125
+
126
+ ```shell
127
+ python3 -m sglang.launch_server \
128
+ --model-path zai-org/GLM-5-FP8 \
129
+ --tp-size 8 \
130
+ --tool-call-parser glm47 \
131
+ --reasoning-parser glm45 \
132
+ --speculative-algorithm EAGLE \
133
+ --speculative-num-steps 3 \
134
+ --speculative-eagle-topk 1 \
135
+ --speculative-num-draft-tokens 4 \
136
+ --mem-fraction-static 0.85 \
137
+ --served-model-name glm-5-fp8
138
+ ```
139
+
140
+ Check the [sglang cookbook](https://cookbook.sglang.io/autoregressive/GLM/GLM-5) for more details.
141
+
142
+ + xLLM and other Ascend NPU
143
+
144
+ Please check the deployment guide [here](https://github.com/zai-org/GLM-5/blob/main/example/ascend.md).
145
+
146
+
147
+ ## Citation
148
+
149
+ Our technical report is coming soon.
chat_template.jinja ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [gMASK]<sop>
2
+ {%- if tools -%}
3
+ <|system|>
4
+ # Tools
5
+
6
+ You may call one or more functions to assist with the user query.
7
+
8
+ You are provided with function signatures within <tools></tools> XML tags:
9
+ <tools>
10
+ {% for tool in tools %}
11
+ {{ tool | tojson(ensure_ascii=False) }}
12
+ {% endfor %}
13
+ </tools>
14
+
15
+ For each function call, output the function name and arguments within the following XML format:
16
+ <tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
17
+ {%- macro visible_text(content) -%}
18
+ {%- if content is string -%}
19
+ {{- content }}
20
+ {%- elif content is iterable and content is not mapping -%}
21
+ {%- for item in content -%}
22
+ {%- if item is mapping and item.type == 'text' -%}
23
+ {{- item.text }}
24
+ {%- elif item is string -%}
25
+ {{- item }}
26
+ {%- endif -%}
27
+ {%- endfor -%}
28
+ {%- else -%}
29
+ {{- content }}
30
+ {%- endif -%}
31
+ {%- endmacro -%}
32
+ {%- set ns = namespace(last_user_index=-1) %}
33
+ {%- for m in messages %}
34
+ {%- if m.role == 'user' %}
35
+ {% set ns.last_user_index = loop.index0 -%}
36
+ {%- endif %}
37
+ {%- endfor %}
38
+ {% for m in messages %}
39
+ {%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
40
+ {%- elif m.role == 'assistant' -%}
41
+ <|assistant|>
42
+ {%- set reasoning_content = '' %}
43
+ {%- set content = visible_text(m.content) %}
44
+ {%- if m.reasoning_content is string %}
45
+ {%- set reasoning_content = m.reasoning_content %}
46
+ {%- else %}
47
+ {%- if '</think>' in content %}
48
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
49
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
50
+ {%- endif %}
51
+ {%- endif %}
52
+ {%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
53
+ {{ '<think>' + reasoning_content.strip() + '</think>'}}
54
+ {%- else -%}
55
+ {{ '</think>' }}
56
+ {%- endif -%}
57
+ {%- if content.strip() -%}
58
+ {{ content.strip() }}
59
+ {%- endif -%}
60
+ {% if m.tool_calls %}
61
+ {% for tc in m.tool_calls %}
62
+ {%- if tc.function %}
63
+ {%- set tc = tc.function %}
64
+ {%- endif %}
65
+ {{- '<tool_call>' + tc.name -}}
66
+ {% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
67
+ {% endif %}
68
+ {%- elif m.role == 'tool' -%}
69
+ {%- if m.content is string -%}
70
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
71
+ {{- '<|observation|>' }}
72
+ {%- endif %}
73
+ {{- '<tool_response>' }}
74
+ {{- m.content }}
75
+ {{- '</tool_response>' }}
76
+ {%- else -%}
77
+ <|observation|>{% for tr in m.content %}
78
+ <tool_response>{{ tr.output if tr.output is defined else tr }}</tool_response>{% endfor -%}
79
+ {% endif -%}
80
+ {%- elif m.role == 'system' -%}
81
+ <|system|>{{ visible_text(m.content) }}
82
+ {%- endif -%}
83
+ {%- endfor -%}
84
+ {%- if add_generation_prompt -%}
85
+ <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
86
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,782 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "GlmMoeDsaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": [
9
+ 154820,
10
+ 154827,
11
+ 154829
12
+ ],
13
+ "ep_size": 1,
14
+ "first_k_dense_replace": 3,
15
+ "hidden_act": "silu",
16
+ "head_dim": 64,
17
+ "hidden_size": 6144,
18
+ "index_head_dim": 128,
19
+ "index_n_heads": 32,
20
+ "index_topk": 2048,
21
+ "indexer_rope_interleave": true,
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 12288,
24
+ "kv_lora_rank": 512,
25
+ "max_position_embeddings": 202752,
26
+ "moe_intermediate_size": 2048,
27
+ "moe_layer_freq": 1,
28
+ "model_type": "glm_moe_dsa",
29
+ "n_group": 1,
30
+ "n_routed_experts": 256,
31
+ "n_shared_experts": 1,
32
+ "norm_topk_prob": true,
33
+ "num_attention_heads": 64,
34
+ "num_experts_per_tok": 8,
35
+ "num_hidden_layers": 78,
36
+ "num_key_value_heads": 64,
37
+ "num_nextn_predict_layers": 1,
38
+ "pad_token_id": 154820,
39
+ "pretraining_tp": 1,
40
+ "q_lora_rank": 2048,
41
+ "qk_head_dim": 256,
42
+ "qk_nope_head_dim": 192,
43
+ "qk_rope_head_dim": 64,
44
+ "rms_norm_eps": 1e-05,
45
+ "rope_interleave": true,
46
+ "rope_parameters": {
47
+ "rope_theta": 1000000,
48
+ "rope_type": "default"
49
+ },
50
+ "routed_scaling_factor": 2.5,
51
+ "scoring_func": "sigmoid",
52
+ "tie_word_embeddings": false,
53
+ "topk_group": 1,
54
+ "topk_method": "noaux_tc",
55
+ "transformers_version": "5.0.2.dev0",
56
+ "use_cache": true,
57
+ "v_head_dim": 256,
58
+ "vocab_size": 154880,
59
+ "quantization_config": {
60
+ "activation_scheme": "dynamic",
61
+ "fmt": "e4m3",
62
+ "quant_method": "fp8",
63
+ "weight_block_size": [
64
+ 128,
65
+ 128
66
+ ],
67
+ "modules_to_not_convert": [
68
+ "lm_head",
69
+ "model.embed_tokens",
70
+ "model.layers.0.input_layernorm",
71
+ "model.layers.0.post_attention_layernorm",
72
+ "model.layers.0.self_attn.indexer.k_norm",
73
+ "model.layers.0.self_attn.indexer.k_norm.bias",
74
+ "model.layers.0.self_attn.indexers_proj",
75
+ "model.layers.0.self_attn.kv_a_layernorm",
76
+ "model.layers.0.self_attn.q_a_layernorm",
77
+ "model.layers.1.input_layernorm",
78
+ "model.layers.1.post_attention_layernorm",
79
+ "model.layers.1.self_attn.indexer.k_norm",
80
+ "model.layers.1.self_attn.indexer.k_norm.bias",
81
+ "model.layers.1.self_attn.indexers_proj",
82
+ "model.layers.1.self_attn.kv_a_layernorm",
83
+ "model.layers.1.self_attn.q_a_layernorm",
84
+ "model.layers.2.input_layernorm",
85
+ "model.layers.2.post_attention_layernorm",
86
+ "model.layers.2.self_attn.indexer.k_norm",
87
+ "model.layers.2.self_attn.indexer.k_norm.bias",
88
+ "model.layers.2.self_attn.indexers_proj",
89
+ "model.layers.2.self_attn.kv_a_layernorm",
90
+ "model.layers.2.self_attn.q_a_layernorm",
91
+ "model.layers.3.input_layernorm",
92
+ "model.layers.3.mlp.gate",
93
+ "model.layers.3.mlp.gate.e_score_correction_bias",
94
+ "model.layers.3.post_attention_layernorm",
95
+ "model.layers.3.self_attn.indexer.k_norm",
96
+ "model.layers.3.self_attn.indexer.k_norm.bias",
97
+ "model.layers.3.self_attn.indexers_proj",
98
+ "model.layers.3.self_attn.kv_a_layernorm",
99
+ "model.layers.3.self_attn.q_a_layernorm",
100
+ "model.layers.4.input_layernorm",
101
+ "model.layers.4.mlp.gate",
102
+ "model.layers.4.mlp.gate.e_score_correction_bias",
103
+ "model.layers.4.post_attention_layernorm",
104
+ "model.layers.4.self_attn.indexer.k_norm",
105
+ "model.layers.4.self_attn.indexer.k_norm.bias",
106
+ "model.layers.4.self_attn.indexers_proj",
107
+ "model.layers.4.self_attn.kv_a_layernorm",
108
+ "model.layers.4.self_attn.q_a_layernorm",
109
+ "model.layers.5.input_layernorm",
110
+ "model.layers.5.mlp.gate",
111
+ "model.layers.5.mlp.gate.e_score_correction_bias",
112
+ "model.layers.5.post_attention_layernorm",
113
+ "model.layers.5.self_attn.indexer.k_norm",
114
+ "model.layers.5.self_attn.indexer.k_norm.bias",
115
+ "model.layers.5.self_attn.indexers_proj",
116
+ "model.layers.5.self_attn.kv_a_layernorm",
117
+ "model.layers.5.self_attn.q_a_layernorm",
118
+ "model.layers.6.input_layernorm",
119
+ "model.layers.6.mlp.gate",
120
+ "model.layers.6.mlp.gate.e_score_correction_bias",
121
+ "model.layers.6.post_attention_layernorm",
122
+ "model.layers.6.self_attn.indexer.k_norm",
123
+ "model.layers.6.self_attn.indexer.k_norm.bias",
124
+ "model.layers.6.self_attn.indexers_proj",
125
+ "model.layers.6.self_attn.kv_a_layernorm",
126
+ "model.layers.6.self_attn.q_a_layernorm",
127
+ "model.layers.7.input_layernorm",
128
+ "model.layers.7.mlp.gate",
129
+ "model.layers.7.mlp.gate.e_score_correction_bias",
130
+ "model.layers.7.post_attention_layernorm",
131
+ "model.layers.7.self_attn.indexer.k_norm",
132
+ "model.layers.7.self_attn.indexer.k_norm.bias",
133
+ "model.layers.7.self_attn.indexers_proj",
134
+ "model.layers.7.self_attn.kv_a_layernorm",
135
+ "model.layers.7.self_attn.q_a_layernorm",
136
+ "model.layers.8.input_layernorm",
137
+ "model.layers.8.mlp.gate",
138
+ "model.layers.8.mlp.gate.e_score_correction_bias",
139
+ "model.layers.8.post_attention_layernorm",
140
+ "model.layers.8.self_attn.indexer.k_norm",
141
+ "model.layers.8.self_attn.indexer.k_norm.bias",
142
+ "model.layers.8.self_attn.indexers_proj",
143
+ "model.layers.8.self_attn.kv_a_layernorm",
144
+ "model.layers.8.self_attn.q_a_layernorm",
145
+ "model.layers.9.input_layernorm",
146
+ "model.layers.9.mlp.gate",
147
+ "model.layers.9.mlp.gate.e_score_correction_bias",
148
+ "model.layers.9.post_attention_layernorm",
149
+ "model.layers.9.self_attn.indexer.k_norm",
150
+ "model.layers.9.self_attn.indexer.k_norm.bias",
151
+ "model.layers.9.self_attn.indexers_proj",
152
+ "model.layers.9.self_attn.kv_a_layernorm",
153
+ "model.layers.9.self_attn.q_a_layernorm",
154
+ "model.layers.10.input_layernorm",
155
+ "model.layers.10.mlp.gate",
156
+ "model.layers.10.mlp.gate.e_score_correction_bias",
157
+ "model.layers.10.post_attention_layernorm",
158
+ "model.layers.10.self_attn.indexer.k_norm",
159
+ "model.layers.10.self_attn.indexer.k_norm.bias",
160
+ "model.layers.10.self_attn.indexers_proj",
161
+ "model.layers.10.self_attn.kv_a_layernorm",
162
+ "model.layers.10.self_attn.q_a_layernorm",
163
+ "model.layers.11.input_layernorm",
164
+ "model.layers.11.mlp.gate",
165
+ "model.layers.11.mlp.gate.e_score_correction_bias",
166
+ "model.layers.11.post_attention_layernorm",
167
+ "model.layers.11.self_attn.indexer.k_norm",
168
+ "model.layers.11.self_attn.indexer.k_norm.bias",
169
+ "model.layers.11.self_attn.indexers_proj",
170
+ "model.layers.11.self_attn.kv_a_layernorm",
171
+ "model.layers.11.self_attn.q_a_layernorm",
172
+ "model.layers.12.input_layernorm",
173
+ "model.layers.12.mlp.gate",
174
+ "model.layers.12.mlp.gate.e_score_correction_bias",
175
+ "model.layers.12.post_attention_layernorm",
176
+ "model.layers.12.self_attn.indexer.k_norm",
177
+ "model.layers.12.self_attn.indexer.k_norm.bias",
178
+ "model.layers.12.self_attn.indexers_proj",
179
+ "model.layers.12.self_attn.kv_a_layernorm",
180
+ "model.layers.12.self_attn.q_a_layernorm",
181
+ "model.layers.13.input_layernorm",
182
+ "model.layers.13.mlp.gate",
183
+ "model.layers.13.mlp.gate.e_score_correction_bias",
184
+ "model.layers.13.post_attention_layernorm",
185
+ "model.layers.13.self_attn.indexer.k_norm",
186
+ "model.layers.13.self_attn.indexer.k_norm.bias",
187
+ "model.layers.13.self_attn.indexers_proj",
188
+ "model.layers.13.self_attn.kv_a_layernorm",
189
+ "model.layers.13.self_attn.q_a_layernorm",
190
+ "model.layers.14.input_layernorm",
191
+ "model.layers.14.mlp.gate",
192
+ "model.layers.14.mlp.gate.e_score_correction_bias",
193
+ "model.layers.14.post_attention_layernorm",
194
+ "model.layers.14.self_attn.indexer.k_norm",
195
+ "model.layers.14.self_attn.indexer.k_norm.bias",
196
+ "model.layers.14.self_attn.indexers_proj",
197
+ "model.layers.14.self_attn.kv_a_layernorm",
198
+ "model.layers.14.self_attn.q_a_layernorm",
199
+ "model.layers.15.input_layernorm",
200
+ "model.layers.15.mlp.gate",
201
+ "model.layers.15.mlp.gate.e_score_correction_bias",
202
+ "model.layers.15.post_attention_layernorm",
203
+ "model.layers.15.self_attn.indexer.k_norm",
204
+ "model.layers.15.self_attn.indexer.k_norm.bias",
205
+ "model.layers.15.self_attn.indexers_proj",
206
+ "model.layers.15.self_attn.kv_a_layernorm",
207
+ "model.layers.15.self_attn.q_a_layernorm",
208
+ "model.layers.16.input_layernorm",
209
+ "model.layers.16.mlp.gate",
210
+ "model.layers.16.mlp.gate.e_score_correction_bias",
211
+ "model.layers.16.post_attention_layernorm",
212
+ "model.layers.16.self_attn.indexer.k_norm",
213
+ "model.layers.16.self_attn.indexer.k_norm.bias",
214
+ "model.layers.16.self_attn.indexers_proj",
215
+ "model.layers.16.self_attn.kv_a_layernorm",
216
+ "model.layers.16.self_attn.q_a_layernorm",
217
+ "model.layers.17.input_layernorm",
218
+ "model.layers.17.mlp.gate",
219
+ "model.layers.17.mlp.gate.e_score_correction_bias",
220
+ "model.layers.17.post_attention_layernorm",
221
+ "model.layers.17.self_attn.indexer.k_norm",
222
+ "model.layers.17.self_attn.indexer.k_norm.bias",
223
+ "model.layers.17.self_attn.indexers_proj",
224
+ "model.layers.17.self_attn.kv_a_layernorm",
225
+ "model.layers.17.self_attn.q_a_layernorm",
226
+ "model.layers.18.input_layernorm",
227
+ "model.layers.18.mlp.gate",
228
+ "model.layers.18.mlp.gate.e_score_correction_bias",
229
+ "model.layers.18.post_attention_layernorm",
230
+ "model.layers.18.self_attn.indexer.k_norm",
231
+ "model.layers.18.self_attn.indexer.k_norm.bias",
232
+ "model.layers.18.self_attn.indexers_proj",
233
+ "model.layers.18.self_attn.kv_a_layernorm",
234
+ "model.layers.18.self_attn.q_a_layernorm",
235
+ "model.layers.19.input_layernorm",
236
+ "model.layers.19.mlp.gate",
237
+ "model.layers.19.mlp.gate.e_score_correction_bias",
238
+ "model.layers.19.post_attention_layernorm",
239
+ "model.layers.19.self_attn.indexer.k_norm",
240
+ "model.layers.19.self_attn.indexer.k_norm.bias",
241
+ "model.layers.19.self_attn.indexers_proj",
242
+ "model.layers.19.self_attn.kv_a_layernorm",
243
+ "model.layers.19.self_attn.q_a_layernorm",
244
+ "model.layers.20.input_layernorm",
245
+ "model.layers.20.mlp.gate",
246
+ "model.layers.20.mlp.gate.e_score_correction_bias",
247
+ "model.layers.20.post_attention_layernorm",
248
+ "model.layers.20.self_attn.indexer.k_norm",
249
+ "model.layers.20.self_attn.indexer.k_norm.bias",
250
+ "model.layers.20.self_attn.indexers_proj",
251
+ "model.layers.20.self_attn.kv_a_layernorm",
252
+ "model.layers.20.self_attn.q_a_layernorm",
253
+ "model.layers.21.input_layernorm",
254
+ "model.layers.21.mlp.gate",
255
+ "model.layers.21.mlp.gate.e_score_correction_bias",
256
+ "model.layers.21.post_attention_layernorm",
257
+ "model.layers.21.self_attn.indexer.k_norm",
258
+ "model.layers.21.self_attn.indexer.k_norm.bias",
259
+ "model.layers.21.self_attn.indexers_proj",
260
+ "model.layers.21.self_attn.kv_a_layernorm",
261
+ "model.layers.21.self_attn.q_a_layernorm",
262
+ "model.layers.22.input_layernorm",
263
+ "model.layers.22.mlp.gate",
264
+ "model.layers.22.mlp.gate.e_score_correction_bias",
265
+ "model.layers.22.post_attention_layernorm",
266
+ "model.layers.22.self_attn.indexer.k_norm",
267
+ "model.layers.22.self_attn.indexer.k_norm.bias",
268
+ "model.layers.22.self_attn.indexers_proj",
269
+ "model.layers.22.self_attn.kv_a_layernorm",
270
+ "model.layers.22.self_attn.q_a_layernorm",
271
+ "model.layers.23.input_layernorm",
272
+ "model.layers.23.mlp.gate",
273
+ "model.layers.23.mlp.gate.e_score_correction_bias",
274
+ "model.layers.23.post_attention_layernorm",
275
+ "model.layers.23.self_attn.indexer.k_norm",
276
+ "model.layers.23.self_attn.indexer.k_norm.bias",
277
+ "model.layers.23.self_attn.indexers_proj",
278
+ "model.layers.23.self_attn.kv_a_layernorm",
279
+ "model.layers.23.self_attn.q_a_layernorm",
280
+ "model.layers.24.input_layernorm",
281
+ "model.layers.24.mlp.gate",
282
+ "model.layers.24.mlp.gate.e_score_correction_bias",
283
+ "model.layers.24.post_attention_layernorm",
284
+ "model.layers.24.self_attn.indexer.k_norm",
285
+ "model.layers.24.self_attn.indexer.k_norm.bias",
286
+ "model.layers.24.self_attn.indexers_proj",
287
+ "model.layers.24.self_attn.kv_a_layernorm",
288
+ "model.layers.24.self_attn.q_a_layernorm",
289
+ "model.layers.25.input_layernorm",
290
+ "model.layers.25.mlp.gate",
291
+ "model.layers.25.mlp.gate.e_score_correction_bias",
292
+ "model.layers.25.post_attention_layernorm",
293
+ "model.layers.25.self_attn.indexer.k_norm",
294
+ "model.layers.25.self_attn.indexer.k_norm.bias",
295
+ "model.layers.25.self_attn.indexers_proj",
296
+ "model.layers.25.self_attn.kv_a_layernorm",
297
+ "model.layers.25.self_attn.q_a_layernorm",
298
+ "model.layers.26.input_layernorm",
299
+ "model.layers.26.mlp.gate",
300
+ "model.layers.26.mlp.gate.e_score_correction_bias",
301
+ "model.layers.26.post_attention_layernorm",
302
+ "model.layers.26.self_attn.indexer.k_norm",
303
+ "model.layers.26.self_attn.indexer.k_norm.bias",
304
+ "model.layers.26.self_attn.indexers_proj",
305
+ "model.layers.26.self_attn.kv_a_layernorm",
306
+ "model.layers.26.self_attn.q_a_layernorm",
307
+ "model.layers.27.input_layernorm",
308
+ "model.layers.27.mlp.gate",
309
+ "model.layers.27.mlp.gate.e_score_correction_bias",
310
+ "model.layers.27.post_attention_layernorm",
311
+ "model.layers.27.self_attn.indexer.k_norm",
312
+ "model.layers.27.self_attn.indexer.k_norm.bias",
313
+ "model.layers.27.self_attn.indexers_proj",
314
+ "model.layers.27.self_attn.kv_a_layernorm",
315
+ "model.layers.27.self_attn.q_a_layernorm",
316
+ "model.layers.28.input_layernorm",
317
+ "model.layers.28.mlp.gate",
318
+ "model.layers.28.mlp.gate.e_score_correction_bias",
319
+ "model.layers.28.post_attention_layernorm",
320
+ "model.layers.28.self_attn.indexer.k_norm",
321
+ "model.layers.28.self_attn.indexer.k_norm.bias",
322
+ "model.layers.28.self_attn.indexers_proj",
323
+ "model.layers.28.self_attn.kv_a_layernorm",
324
+ "model.layers.28.self_attn.q_a_layernorm",
325
+ "model.layers.29.input_layernorm",
326
+ "model.layers.29.mlp.gate",
327
+ "model.layers.29.mlp.gate.e_score_correction_bias",
328
+ "model.layers.29.post_attention_layernorm",
329
+ "model.layers.29.self_attn.indexer.k_norm",
330
+ "model.layers.29.self_attn.indexer.k_norm.bias",
331
+ "model.layers.29.self_attn.indexers_proj",
332
+ "model.layers.29.self_attn.kv_a_layernorm",
333
+ "model.layers.29.self_attn.q_a_layernorm",
334
+ "model.layers.30.input_layernorm",
335
+ "model.layers.30.mlp.gate",
336
+ "model.layers.30.mlp.gate.e_score_correction_bias",
337
+ "model.layers.30.post_attention_layernorm",
338
+ "model.layers.30.self_attn.indexer.k_norm",
339
+ "model.layers.30.self_attn.indexer.k_norm.bias",
340
+ "model.layers.30.self_attn.indexers_proj",
341
+ "model.layers.30.self_attn.kv_a_layernorm",
342
+ "model.layers.30.self_attn.q_a_layernorm",
343
+ "model.layers.31.input_layernorm",
344
+ "model.layers.31.mlp.gate",
345
+ "model.layers.31.mlp.gate.e_score_correction_bias",
346
+ "model.layers.31.post_attention_layernorm",
347
+ "model.layers.31.self_attn.indexer.k_norm",
348
+ "model.layers.31.self_attn.indexer.k_norm.bias",
349
+ "model.layers.31.self_attn.indexers_proj",
350
+ "model.layers.31.self_attn.kv_a_layernorm",
351
+ "model.layers.31.self_attn.q_a_layernorm",
352
+ "model.layers.32.input_layernorm",
353
+ "model.layers.32.mlp.gate",
354
+ "model.layers.32.mlp.gate.e_score_correction_bias",
355
+ "model.layers.32.post_attention_layernorm",
356
+ "model.layers.32.self_attn.indexer.k_norm",
357
+ "model.layers.32.self_attn.indexer.k_norm.bias",
358
+ "model.layers.32.self_attn.indexers_proj",
359
+ "model.layers.32.self_attn.kv_a_layernorm",
360
+ "model.layers.32.self_attn.q_a_layernorm",
361
+ "model.layers.33.input_layernorm",
362
+ "model.layers.33.mlp.gate",
363
+ "model.layers.33.mlp.gate.e_score_correction_bias",
364
+ "model.layers.33.post_attention_layernorm",
365
+ "model.layers.33.self_attn.indexer.k_norm",
366
+ "model.layers.33.self_attn.indexer.k_norm.bias",
367
+ "model.layers.33.self_attn.indexers_proj",
368
+ "model.layers.33.self_attn.kv_a_layernorm",
369
+ "model.layers.33.self_attn.q_a_layernorm",
370
+ "model.layers.34.input_layernorm",
371
+ "model.layers.34.mlp.gate",
372
+ "model.layers.34.mlp.gate.e_score_correction_bias",
373
+ "model.layers.34.post_attention_layernorm",
374
+ "model.layers.34.self_attn.indexer.k_norm",
375
+ "model.layers.34.self_attn.indexer.k_norm.bias",
376
+ "model.layers.34.self_attn.indexers_proj",
377
+ "model.layers.34.self_attn.kv_a_layernorm",
378
+ "model.layers.34.self_attn.q_a_layernorm",
379
+ "model.layers.35.input_layernorm",
380
+ "model.layers.35.mlp.gate",
381
+ "model.layers.35.mlp.gate.e_score_correction_bias",
382
+ "model.layers.35.post_attention_layernorm",
383
+ "model.layers.35.self_attn.indexer.k_norm",
384
+ "model.layers.35.self_attn.indexer.k_norm.bias",
385
+ "model.layers.35.self_attn.indexers_proj",
386
+ "model.layers.35.self_attn.kv_a_layernorm",
387
+ "model.layers.35.self_attn.q_a_layernorm",
388
+ "model.layers.36.input_layernorm",
389
+ "model.layers.36.mlp.gate",
390
+ "model.layers.36.mlp.gate.e_score_correction_bias",
391
+ "model.layers.36.post_attention_layernorm",
392
+ "model.layers.36.self_attn.indexer.k_norm",
393
+ "model.layers.36.self_attn.indexer.k_norm.bias",
394
+ "model.layers.36.self_attn.indexers_proj",
395
+ "model.layers.36.self_attn.kv_a_layernorm",
396
+ "model.layers.36.self_attn.q_a_layernorm",
397
+ "model.layers.37.input_layernorm",
398
+ "model.layers.37.mlp.gate",
399
+ "model.layers.37.mlp.gate.e_score_correction_bias",
400
+ "model.layers.37.post_attention_layernorm",
401
+ "model.layers.37.self_attn.indexer.k_norm",
402
+ "model.layers.37.self_attn.indexer.k_norm.bias",
403
+ "model.layers.37.self_attn.indexers_proj",
404
+ "model.layers.37.self_attn.kv_a_layernorm",
405
+ "model.layers.37.self_attn.q_a_layernorm",
406
+ "model.layers.38.input_layernorm",
407
+ "model.layers.38.mlp.gate",
408
+ "model.layers.38.mlp.gate.e_score_correction_bias",
409
+ "model.layers.38.post_attention_layernorm",
410
+ "model.layers.38.self_attn.indexer.k_norm",
411
+ "model.layers.38.self_attn.indexer.k_norm.bias",
412
+ "model.layers.38.self_attn.indexers_proj",
413
+ "model.layers.38.self_attn.kv_a_layernorm",
414
+ "model.layers.38.self_attn.q_a_layernorm",
415
+ "model.layers.39.input_layernorm",
416
+ "model.layers.39.mlp.gate",
417
+ "model.layers.39.mlp.gate.e_score_correction_bias",
418
+ "model.layers.39.post_attention_layernorm",
419
+ "model.layers.39.self_attn.indexer.k_norm",
420
+ "model.layers.39.self_attn.indexer.k_norm.bias",
421
+ "model.layers.39.self_attn.indexers_proj",
422
+ "model.layers.39.self_attn.kv_a_layernorm",
423
+ "model.layers.39.self_attn.q_a_layernorm",
424
+ "model.layers.40.input_layernorm",
425
+ "model.layers.40.mlp.gate",
426
+ "model.layers.40.mlp.gate.e_score_correction_bias",
427
+ "model.layers.40.post_attention_layernorm",
428
+ "model.layers.40.self_attn.indexer.k_norm",
429
+ "model.layers.40.self_attn.indexer.k_norm.bias",
430
+ "model.layers.40.self_attn.indexers_proj",
431
+ "model.layers.40.self_attn.kv_a_layernorm",
432
+ "model.layers.40.self_attn.q_a_layernorm",
433
+ "model.layers.41.input_layernorm",
434
+ "model.layers.41.mlp.gate",
435
+ "model.layers.41.mlp.gate.e_score_correction_bias",
436
+ "model.layers.41.post_attention_layernorm",
437
+ "model.layers.41.self_attn.indexer.k_norm",
438
+ "model.layers.41.self_attn.indexer.k_norm.bias",
439
+ "model.layers.41.self_attn.indexers_proj",
440
+ "model.layers.41.self_attn.kv_a_layernorm",
441
+ "model.layers.41.self_attn.q_a_layernorm",
442
+ "model.layers.42.input_layernorm",
443
+ "model.layers.42.mlp.gate",
444
+ "model.layers.42.mlp.gate.e_score_correction_bias",
445
+ "model.layers.42.post_attention_layernorm",
446
+ "model.layers.42.self_attn.indexer.k_norm",
447
+ "model.layers.42.self_attn.indexer.k_norm.bias",
448
+ "model.layers.42.self_attn.indexers_proj",
449
+ "model.layers.42.self_attn.kv_a_layernorm",
450
+ "model.layers.42.self_attn.q_a_layernorm",
451
+ "model.layers.43.input_layernorm",
452
+ "model.layers.43.mlp.gate",
453
+ "model.layers.43.mlp.gate.e_score_correction_bias",
454
+ "model.layers.43.post_attention_layernorm",
455
+ "model.layers.43.self_attn.indexer.k_norm",
456
+ "model.layers.43.self_attn.indexer.k_norm.bias",
457
+ "model.layers.43.self_attn.indexers_proj",
458
+ "model.layers.43.self_attn.kv_a_layernorm",
459
+ "model.layers.43.self_attn.q_a_layernorm",
460
+ "model.layers.44.input_layernorm",
461
+ "model.layers.44.mlp.gate",
462
+ "model.layers.44.mlp.gate.e_score_correction_bias",
463
+ "model.layers.44.post_attention_layernorm",
464
+ "model.layers.44.self_attn.indexer.k_norm",
465
+ "model.layers.44.self_attn.indexer.k_norm.bias",
466
+ "model.layers.44.self_attn.indexers_proj",
467
+ "model.layers.44.self_attn.kv_a_layernorm",
468
+ "model.layers.44.self_attn.q_a_layernorm",
469
+ "model.layers.45.input_layernorm",
470
+ "model.layers.45.mlp.gate",
471
+ "model.layers.45.mlp.gate.e_score_correction_bias",
472
+ "model.layers.45.post_attention_layernorm",
473
+ "model.layers.45.self_attn.indexer.k_norm",
474
+ "model.layers.45.self_attn.indexer.k_norm.bias",
475
+ "model.layers.45.self_attn.indexers_proj",
476
+ "model.layers.45.self_attn.kv_a_layernorm",
477
+ "model.layers.45.self_attn.q_a_layernorm",
478
+ "model.layers.46.input_layernorm",
479
+ "model.layers.46.mlp.gate",
480
+ "model.layers.46.mlp.gate.e_score_correction_bias",
481
+ "model.layers.46.post_attention_layernorm",
482
+ "model.layers.46.self_attn.indexer.k_norm",
483
+ "model.layers.46.self_attn.indexer.k_norm.bias",
484
+ "model.layers.46.self_attn.indexers_proj",
485
+ "model.layers.46.self_attn.kv_a_layernorm",
486
+ "model.layers.46.self_attn.q_a_layernorm",
487
+ "model.layers.47.input_layernorm",
488
+ "model.layers.47.mlp.gate",
489
+ "model.layers.47.mlp.gate.e_score_correction_bias",
490
+ "model.layers.47.post_attention_layernorm",
491
+ "model.layers.47.self_attn.indexer.k_norm",
492
+ "model.layers.47.self_attn.indexer.k_norm.bias",
493
+ "model.layers.47.self_attn.indexers_proj",
494
+ "model.layers.47.self_attn.kv_a_layernorm",
495
+ "model.layers.47.self_attn.q_a_layernorm",
496
+ "model.layers.48.input_layernorm",
497
+ "model.layers.48.mlp.gate",
498
+ "model.layers.48.mlp.gate.e_score_correction_bias",
499
+ "model.layers.48.post_attention_layernorm",
500
+ "model.layers.48.self_attn.indexer.k_norm",
501
+ "model.layers.48.self_attn.indexer.k_norm.bias",
502
+ "model.layers.48.self_attn.indexers_proj",
503
+ "model.layers.48.self_attn.kv_a_layernorm",
504
+ "model.layers.48.self_attn.q_a_layernorm",
505
+ "model.layers.49.input_layernorm",
506
+ "model.layers.49.mlp.gate",
507
+ "model.layers.49.mlp.gate.e_score_correction_bias",
508
+ "model.layers.49.post_attention_layernorm",
509
+ "model.layers.49.self_attn.indexer.k_norm",
510
+ "model.layers.49.self_attn.indexer.k_norm.bias",
511
+ "model.layers.49.self_attn.indexers_proj",
512
+ "model.layers.49.self_attn.kv_a_layernorm",
513
+ "model.layers.49.self_attn.q_a_layernorm",
514
+ "model.layers.50.input_layernorm",
515
+ "model.layers.50.mlp.gate",
516
+ "model.layers.50.mlp.gate.e_score_correction_bias",
517
+ "model.layers.50.post_attention_layernorm",
518
+ "model.layers.50.self_attn.indexer.k_norm",
519
+ "model.layers.50.self_attn.indexer.k_norm.bias",
520
+ "model.layers.50.self_attn.indexers_proj",
521
+ "model.layers.50.self_attn.kv_a_layernorm",
522
+ "model.layers.50.self_attn.q_a_layernorm",
523
+ "model.layers.51.input_layernorm",
524
+ "model.layers.51.mlp.gate",
525
+ "model.layers.51.mlp.gate.e_score_correction_bias",
526
+ "model.layers.51.post_attention_layernorm",
527
+ "model.layers.51.self_attn.indexer.k_norm",
528
+ "model.layers.51.self_attn.indexer.k_norm.bias",
529
+ "model.layers.51.self_attn.indexers_proj",
530
+ "model.layers.51.self_attn.kv_a_layernorm",
531
+ "model.layers.51.self_attn.q_a_layernorm",
532
+ "model.layers.52.input_layernorm",
533
+ "model.layers.52.mlp.gate",
534
+ "model.layers.52.mlp.gate.e_score_correction_bias",
535
+ "model.layers.52.post_attention_layernorm",
536
+ "model.layers.52.self_attn.indexer.k_norm",
537
+ "model.layers.52.self_attn.indexer.k_norm.bias",
538
+ "model.layers.52.self_attn.indexers_proj",
539
+ "model.layers.52.self_attn.kv_a_layernorm",
540
+ "model.layers.52.self_attn.q_a_layernorm",
541
+ "model.layers.53.input_layernorm",
542
+ "model.layers.53.mlp.gate",
543
+ "model.layers.53.mlp.gate.e_score_correction_bias",
544
+ "model.layers.53.post_attention_layernorm",
545
+ "model.layers.53.self_attn.indexer.k_norm",
546
+ "model.layers.53.self_attn.indexer.k_norm.bias",
547
+ "model.layers.53.self_attn.indexers_proj",
548
+ "model.layers.53.self_attn.kv_a_layernorm",
549
+ "model.layers.53.self_attn.q_a_layernorm",
550
+ "model.layers.54.input_layernorm",
551
+ "model.layers.54.mlp.gate",
552
+ "model.layers.54.mlp.gate.e_score_correction_bias",
553
+ "model.layers.54.post_attention_layernorm",
554
+ "model.layers.54.self_attn.indexer.k_norm",
555
+ "model.layers.54.self_attn.indexer.k_norm.bias",
556
+ "model.layers.54.self_attn.indexers_proj",
557
+ "model.layers.54.self_attn.kv_a_layernorm",
558
+ "model.layers.54.self_attn.q_a_layernorm",
559
+ "model.layers.55.input_layernorm",
560
+ "model.layers.55.mlp.gate",
561
+ "model.layers.55.mlp.gate.e_score_correction_bias",
562
+ "model.layers.55.post_attention_layernorm",
563
+ "model.layers.55.self_attn.indexer.k_norm",
564
+ "model.layers.55.self_attn.indexer.k_norm.bias",
565
+ "model.layers.55.self_attn.indexers_proj",
566
+ "model.layers.55.self_attn.kv_a_layernorm",
567
+ "model.layers.55.self_attn.q_a_layernorm",
568
+ "model.layers.56.input_layernorm",
569
+ "model.layers.56.mlp.gate",
570
+ "model.layers.56.mlp.gate.e_score_correction_bias",
571
+ "model.layers.56.post_attention_layernorm",
572
+ "model.layers.56.self_attn.indexer.k_norm",
573
+ "model.layers.56.self_attn.indexer.k_norm.bias",
574
+ "model.layers.56.self_attn.indexers_proj",
575
+ "model.layers.56.self_attn.kv_a_layernorm",
576
+ "model.layers.56.self_attn.q_a_layernorm",
577
+ "model.layers.57.input_layernorm",
578
+ "model.layers.57.mlp.gate",
579
+ "model.layers.57.mlp.gate.e_score_correction_bias",
580
+ "model.layers.57.post_attention_layernorm",
581
+ "model.layers.57.self_attn.indexer.k_norm",
582
+ "model.layers.57.self_attn.indexer.k_norm.bias",
583
+ "model.layers.57.self_attn.indexers_proj",
584
+ "model.layers.57.self_attn.kv_a_layernorm",
585
+ "model.layers.57.self_attn.q_a_layernorm",
586
+ "model.layers.58.input_layernorm",
587
+ "model.layers.58.mlp.gate",
588
+ "model.layers.58.mlp.gate.e_score_correction_bias",
589
+ "model.layers.58.post_attention_layernorm",
590
+ "model.layers.58.self_attn.indexer.k_norm",
591
+ "model.layers.58.self_attn.indexer.k_norm.bias",
592
+ "model.layers.58.self_attn.indexers_proj",
593
+ "model.layers.58.self_attn.kv_a_layernorm",
594
+ "model.layers.58.self_attn.q_a_layernorm",
595
+ "model.layers.59.input_layernorm",
596
+ "model.layers.59.mlp.gate",
597
+ "model.layers.59.mlp.gate.e_score_correction_bias",
598
+ "model.layers.59.post_attention_layernorm",
599
+ "model.layers.59.self_attn.indexer.k_norm",
600
+ "model.layers.59.self_attn.indexer.k_norm.bias",
601
+ "model.layers.59.self_attn.indexers_proj",
602
+ "model.layers.59.self_attn.kv_a_layernorm",
603
+ "model.layers.59.self_attn.q_a_layernorm",
604
+ "model.layers.60.input_layernorm",
605
+ "model.layers.60.mlp.gate",
606
+ "model.layers.60.mlp.gate.e_score_correction_bias",
607
+ "model.layers.60.post_attention_layernorm",
608
+ "model.layers.60.self_attn.indexer.k_norm",
609
+ "model.layers.60.self_attn.indexer.k_norm.bias",
610
+ "model.layers.60.self_attn.indexers_proj",
611
+ "model.layers.60.self_attn.kv_a_layernorm",
612
+ "model.layers.60.self_attn.q_a_layernorm",
613
+ "model.layers.61.input_layernorm",
614
+ "model.layers.61.mlp.gate",
615
+ "model.layers.61.mlp.gate.e_score_correction_bias",
616
+ "model.layers.61.post_attention_layernorm",
617
+ "model.layers.61.self_attn.indexer.k_norm",
618
+ "model.layers.61.self_attn.indexer.k_norm.bias",
619
+ "model.layers.61.self_attn.indexers_proj",
620
+ "model.layers.61.self_attn.kv_a_layernorm",
621
+ "model.layers.61.self_attn.q_a_layernorm",
622
+ "model.layers.62.input_layernorm",
623
+ "model.layers.62.mlp.gate",
624
+ "model.layers.62.mlp.gate.e_score_correction_bias",
625
+ "model.layers.62.post_attention_layernorm",
626
+ "model.layers.62.self_attn.indexer.k_norm",
627
+ "model.layers.62.self_attn.indexer.k_norm.bias",
628
+ "model.layers.62.self_attn.indexers_proj",
629
+ "model.layers.62.self_attn.kv_a_layernorm",
630
+ "model.layers.62.self_attn.q_a_layernorm",
631
+ "model.layers.63.input_layernorm",
632
+ "model.layers.63.mlp.gate",
633
+ "model.layers.63.mlp.gate.e_score_correction_bias",
634
+ "model.layers.63.post_attention_layernorm",
635
+ "model.layers.63.self_attn.indexer.k_norm",
636
+ "model.layers.63.self_attn.indexer.k_norm.bias",
637
+ "model.layers.63.self_attn.indexers_proj",
638
+ "model.layers.63.self_attn.kv_a_layernorm",
639
+ "model.layers.63.self_attn.q_a_layernorm",
640
+ "model.layers.64.input_layernorm",
641
+ "model.layers.64.mlp.gate",
642
+ "model.layers.64.mlp.gate.e_score_correction_bias",
643
+ "model.layers.64.post_attention_layernorm",
644
+ "model.layers.64.self_attn.indexer.k_norm",
645
+ "model.layers.64.self_attn.indexer.k_norm.bias",
646
+ "model.layers.64.self_attn.indexers_proj",
647
+ "model.layers.64.self_attn.kv_a_layernorm",
648
+ "model.layers.64.self_attn.q_a_layernorm",
649
+ "model.layers.65.input_layernorm",
650
+ "model.layers.65.mlp.gate",
651
+ "model.layers.65.mlp.gate.e_score_correction_bias",
652
+ "model.layers.65.post_attention_layernorm",
653
+ "model.layers.65.self_attn.indexer.k_norm",
654
+ "model.layers.65.self_attn.indexer.k_norm.bias",
655
+ "model.layers.65.self_attn.indexers_proj",
656
+ "model.layers.65.self_attn.kv_a_layernorm",
657
+ "model.layers.65.self_attn.q_a_layernorm",
658
+ "model.layers.66.input_layernorm",
659
+ "model.layers.66.mlp.gate",
660
+ "model.layers.66.mlp.gate.e_score_correction_bias",
661
+ "model.layers.66.post_attention_layernorm",
662
+ "model.layers.66.self_attn.indexer.k_norm",
663
+ "model.layers.66.self_attn.indexer.k_norm.bias",
664
+ "model.layers.66.self_attn.indexers_proj",
665
+ "model.layers.66.self_attn.kv_a_layernorm",
666
+ "model.layers.66.self_attn.q_a_layernorm",
667
+ "model.layers.67.input_layernorm",
668
+ "model.layers.67.mlp.gate",
669
+ "model.layers.67.mlp.gate.e_score_correction_bias",
670
+ "model.layers.67.post_attention_layernorm",
671
+ "model.layers.67.self_attn.indexer.k_norm",
672
+ "model.layers.67.self_attn.indexer.k_norm.bias",
673
+ "model.layers.67.self_attn.indexers_proj",
674
+ "model.layers.67.self_attn.kv_a_layernorm",
675
+ "model.layers.67.self_attn.q_a_layernorm",
676
+ "model.layers.68.input_layernorm",
677
+ "model.layers.68.mlp.gate",
678
+ "model.layers.68.mlp.gate.e_score_correction_bias",
679
+ "model.layers.68.post_attention_layernorm",
680
+ "model.layers.68.self_attn.indexer.k_norm",
681
+ "model.layers.68.self_attn.indexer.k_norm.bias",
682
+ "model.layers.68.self_attn.indexers_proj",
683
+ "model.layers.68.self_attn.kv_a_layernorm",
684
+ "model.layers.68.self_attn.q_a_layernorm",
685
+ "model.layers.69.input_layernorm",
686
+ "model.layers.69.mlp.gate",
687
+ "model.layers.69.mlp.gate.e_score_correction_bias",
688
+ "model.layers.69.post_attention_layernorm",
689
+ "model.layers.69.self_attn.indexer.k_norm",
690
+ "model.layers.69.self_attn.indexer.k_norm.bias",
691
+ "model.layers.69.self_attn.indexers_proj",
692
+ "model.layers.69.self_attn.kv_a_layernorm",
693
+ "model.layers.69.self_attn.q_a_layernorm",
694
+ "model.layers.70.input_layernorm",
695
+ "model.layers.70.mlp.gate",
696
+ "model.layers.70.mlp.gate.e_score_correction_bias",
697
+ "model.layers.70.post_attention_layernorm",
698
+ "model.layers.70.self_attn.indexer.k_norm",
699
+ "model.layers.70.self_attn.indexer.k_norm.bias",
700
+ "model.layers.70.self_attn.indexers_proj",
701
+ "model.layers.70.self_attn.kv_a_layernorm",
702
+ "model.layers.70.self_attn.q_a_layernorm",
703
+ "model.layers.71.input_layernorm",
704
+ "model.layers.71.mlp.gate",
705
+ "model.layers.71.mlp.gate.e_score_correction_bias",
706
+ "model.layers.71.post_attention_layernorm",
707
+ "model.layers.71.self_attn.indexer.k_norm",
708
+ "model.layers.71.self_attn.indexer.k_norm.bias",
709
+ "model.layers.71.self_attn.indexers_proj",
710
+ "model.layers.71.self_attn.kv_a_layernorm",
711
+ "model.layers.71.self_attn.q_a_layernorm",
712
+ "model.layers.72.input_layernorm",
713
+ "model.layers.72.mlp.gate",
714
+ "model.layers.72.mlp.gate.e_score_correction_bias",
715
+ "model.layers.72.post_attention_layernorm",
716
+ "model.layers.72.self_attn.indexer.k_norm",
717
+ "model.layers.72.self_attn.indexer.k_norm.bias",
718
+ "model.layers.72.self_attn.indexers_proj",
719
+ "model.layers.72.self_attn.kv_a_layernorm",
720
+ "model.layers.72.self_attn.q_a_layernorm",
721
+ "model.layers.73.input_layernorm",
722
+ "model.layers.73.mlp.gate",
723
+ "model.layers.73.mlp.gate.e_score_correction_bias",
724
+ "model.layers.73.post_attention_layernorm",
725
+ "model.layers.73.self_attn.indexer.k_norm",
726
+ "model.layers.73.self_attn.indexer.k_norm.bias",
727
+ "model.layers.73.self_attn.indexers_proj",
728
+ "model.layers.73.self_attn.kv_a_layernorm",
729
+ "model.layers.73.self_attn.q_a_layernorm",
730
+ "model.layers.74.input_layernorm",
731
+ "model.layers.74.mlp.gate",
732
+ "model.layers.74.mlp.gate.e_score_correction_bias",
733
+ "model.layers.74.post_attention_layernorm",
734
+ "model.layers.74.self_attn.indexer.k_norm",
735
+ "model.layers.74.self_attn.indexer.k_norm.bias",
736
+ "model.layers.74.self_attn.indexers_proj",
737
+ "model.layers.74.self_attn.kv_a_layernorm",
738
+ "model.layers.74.self_attn.q_a_layernorm",
739
+ "model.layers.75.input_layernorm",
740
+ "model.layers.75.mlp.gate",
741
+ "model.layers.75.mlp.gate.e_score_correction_bias",
742
+ "model.layers.75.post_attention_layernorm",
743
+ "model.layers.75.self_attn.indexer.k_norm",
744
+ "model.layers.75.self_attn.indexer.k_norm.bias",
745
+ "model.layers.75.self_attn.indexers_proj",
746
+ "model.layers.75.self_attn.kv_a_layernorm",
747
+ "model.layers.75.self_attn.q_a_layernorm",
748
+ "model.layers.76.input_layernorm",
749
+ "model.layers.76.mlp.gate",
750
+ "model.layers.76.mlp.gate.e_score_correction_bias",
751
+ "model.layers.76.post_attention_layernorm",
752
+ "model.layers.76.self_attn.indexer.k_norm",
753
+ "model.layers.76.self_attn.indexer.k_norm.bias",
754
+ "model.layers.76.self_attn.indexers_proj",
755
+ "model.layers.76.self_attn.kv_a_layernorm",
756
+ "model.layers.76.self_attn.q_a_layernorm",
757
+ "model.layers.77.input_layernorm",
758
+ "model.layers.77.mlp.gate",
759
+ "model.layers.77.mlp.gate.e_score_correction_bias",
760
+ "model.layers.77.post_attention_layernorm",
761
+ "model.layers.77.self_attn.indexer.k_norm",
762
+ "model.layers.77.self_attn.indexer.k_norm.bias",
763
+ "model.layers.77.self_attn.indexers_proj",
764
+ "model.layers.77.self_attn.kv_a_layernorm",
765
+ "model.layers.77.self_attn.q_a_layernorm",
766
+ "model.layers.78.eh_proj",
767
+ "model.layers.78.enorm",
768
+ "model.layers.78.hnorm",
769
+ "model.layers.78.input_layernorm",
770
+ "model.layers.78.mlp.gate",
771
+ "model.layers.78.mlp.gate.e_score_correction_bias",
772
+ "model.layers.78.post_attention_layernorm",
773
+ "model.layers.78.self_attn.indexer.k_norm",
774
+ "model.layers.78.self_attn.indexer.k_norm.bias",
775
+ "model.layers.78.self_attn.indexers_proj",
776
+ "model.layers.78.self_attn.kv_a_layernorm",
777
+ "model.layers.78.self_attn.q_a_layernorm",
778
+ "model.layers.78.shared_head.norm",
779
+ "model.norm"
780
+ ]
781
+ }
782
+ }
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [
4
+ 154820,
5
+ 154827,
6
+ 154829
7
+ ],
8
+ "pad_token_id": 154820,
9
+ "temperature": 1.0,
10
+ "top_p": 0.95,
11
+ "transformers_version": "5.0.2.dev0"
12
+ }
model-00001-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c75e4dc9f7b1e345170f5cd3c47c764dfe631d7895008f5efdc82b2dde92de4c
3
+ size 5363940952
model-00002-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a17170e28d81d697955ae340c0927b648e8e1f6b06a6ae8051f286c8f74f725
3
+ size 5361736696
model-00003-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1370f0035979329b828d21b11369784f52dcdb19bf0268ad934c18f79c0f44c6
3
+ size 5363339120
model-00004-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e72700bd7b0f1bf6fd96ee23f8b8895a8d00ac9b8b9c11446f297d82c91015a4
3
+ size 5361736640
model-00005-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e518a59a00bc1ec97d36f19734e21152e6a9faeb193d98e1278356be40101cc9
3
+ size 5363339176
model-00006-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d37be93a5270cf8f7e4c20e5a733a31aff2bd83bc9656e9b3d7dc1b9a2d5926
3
+ size 5361736504
model-00007-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f096b79ea10167b5e9fc66e99f0654c684893009cef9342ed72a363631b552d
3
+ size 5363339304
model-00008-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39cdaf4f718d51bd65c5cc812cce5fd116742ba34904925b2251fe92e380a3ab
3
+ size 5361736368
model-00009-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a294efbbf5c7f0eca60797c7a6902045d84896371cc5815ca431ce52afcc567
3
+ size 5363339440
model-00010-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:390eaf7f2c80d1ac017521f4a5113677abfed056753d05c451a7c2a546384998
3
+ size 5361736232
model-00011-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f0dcfd2ac425ac53d1f8496738a1d589abf751ed417f8208bc806ec64e2ebc2
3
+ size 5363339592
model-00012-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3162c33985af89993a70da3530ff552c4885e569f8ca06e84b3bead004488a8
3
+ size 5363339104
model-00013-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2d1702e2f1958d30eafb81c69f0abf0989c3d8bc82b752793a246c6cc60aa1c
3
+ size 5361736696
model-00014-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:81603c43fd54122915e648c6760efb06b2074ddf979220702f0a408776a8749f
3
+ size 5363339120
model-00015-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a9a7e63e77ebefad25c210f3448528febd9233c66764e772fa0d87673f35ba1
3
+ size 5361736688
model-00016-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9868c6b52a816ef19ee78bdb40d3a9f588dd7d8b55e07ef669ceb7cc8e215405
3
+ size 5363339128
model-00017-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b15acba1b715f59fd5faff35837c873a5c2ba3cbd18f761e230cd0162305a34
3
+ size 5361736552
model-00018-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52507d77cd4eadb12479d79acb5d860fa782de2ddc244d11f9e5a80a63127750
3
+ size 5363339256
model-00019-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57cb5e9b3ca010298d6449f011dc04db5c66c70bac463dce08d63a8d60654858
3
+ size 5361736416
model-00020-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18b9acd03945bbf30331843a692e68e0ac009355471a5b2c3d887a51d79d5f08
3
+ size 5361791448
model-00021-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7d4fcf67f0abda1ac39e21cdf4d004f90109a33ee456ce39e53ea744c2c2fd1
3
+ size 5361736352
model-00022-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7329ef4f994a7ac22901e465f943665708ad88236e630cae0532cf75ee4cc474
3
+ size 5363339456
model-00023-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d546f87e34309770629fdac04b16e16fee4abe435b9613feb1236ec95a0380fc
3
+ size 5361736224
model-00024-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79b755e74f642cb8ec7627f50414a6efe617485200f29acb65b340c78a603a79
3
+ size 5363339608
model-00025-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b7c7a8ac34e62424109fdc5d58bd5667adae8dcc8bf80254142ca14113f575d
3
+ size 5363339104
model-00026-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18d098486d2b0fe98046868dfc5c052f752768df47571eb250c8629e99a90f4b
3
+ size 5361736696
model-00027-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15dfa9511f0446684a5afc4a7afff701a2c03c72ff16750c61e6d11f19e63f22
3
+ size 5363339112
model-00028-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d3d1c850ccb27716365a4e7a6078f7a589d2b94b780f0fd7f6ac5222d75402d
3
+ size 5361736664
model-00029-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b061a3c1f07728dedc62e70e1382882c6f28c9d4c3657e68c7edb1f9d8b3c7a9
3
+ size 5363339152
model-00030-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2de093fc6246082803e35e0d9f4a44f674a3182157df6afa1616aa1981ddc8d
3
+ size 5361736528
model-00031-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58f45b0fbdcec9210e20e21d83d85c916f4c7a8f82589950960b17b0ea882eb6
3
+ size 5363339280
model-00032-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5107083d264ed50fa120c4a527c80c61c79406918b16200a9db23bb5ff798c6d
3
+ size 5361736392
model-00033-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cd86b5c670b8f1cf16a69e25e96ff9f09cc495eb7978ada4ac8eb14729f4586
3
+ size 5363339416
model-00034-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6d39453d8ca90f5e38e0d51a95fd585766ad95488fb79d9f5d651bbcfe765bd
3
+ size 5361736264
model-00035-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9d557ec5601653b8be2bb8ed3121159b368a53e6c6e27a7a901f2be859a3e32
3
+ size 5363339544
model-00036-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e162b3ad7bb627cd8984e41a62d049f070321b4b30ee7deb335f88a9ab93de9
3
+ size 5363339104
model-00037-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f980f59d1f5bc3dd9a717824b652045da1ea20b32aee9593811453e97d75c64
3
+ size 5361736696
model-00038-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74963234bf95c73eb5d2fcfefb91e266b8174d3bef480f1080dd7548f3e43110
3
+ size 5363338920
model-00039-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9e10c68987f919c926469f7adb27d7279989dac736bb7115e9d8f19f9e7d15b
3
+ size 5361735840
model-00040-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ea19f319667a7ff62aea9182eb39e676c44a1b2792464ede0dff578ca255030
3
+ size 5363338584
model-00041-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff662658a03396680e04029d8d2fd8963505a3731d8abbaa3112fccaa75c7af7
3
+ size 5361736576
model-00042-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:77fdeae54c5306335c557fe56a2d4065fe7f8bcbe38f89751b98584490f88900
3
+ size 5363339232
model-00043-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:53dba89973165ece2063a47af49ffddbd92393728164eec82e522d99f360dbe9
3
+ size 5361736440
model-00044-of-00142.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d2aa1c3abc6c15bdc46c0a675729715add84dd7d9218165618a728ad6a3e5dc
3
+ size 5363339368
tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": false,
4
+ "do_lower_case": false,
5
+ "eos_token": "<|endoftext|>",
6
+ "extra_special_tokens": [
7
+ "<|endoftext|>",
8
+ "[MASK]",
9
+ "[gMASK]",
10
+ "[sMASK]",
11
+ "<sop>",
12
+ "<eop>",
13
+ "<|system|>",
14
+ "<|user|>",
15
+ "<|assistant|>",
16
+ "<|observation|>",
17
+ "<|begin_of_image|>",
18
+ "<|end_of_image|>",
19
+ "<|begin_of_video|>",
20
+ "<|end_of_video|>",
21
+ "<|begin_of_audio|>",
22
+ "<|end_of_audio|>",
23
+ "<|begin_of_transcription|>",
24
+ "<|end_of_transcription|>"
25
+ ],
26
+ "is_local": true,
27
+ "model_max_length": 202752,
28
+ "model_specific_special_tokens": {},
29
+ "pad_token": "<|endoftext|>",
30
+ "padding_side": "left",
31
+ "remove_space": false,
32
+ "tokenizer_class": "TokenizersBackend"
33
+ }