windlx commited on
Commit
ecfd7d2
·
verified ·
1 Parent(s): 1e9eb5d

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,188 +1,207 @@
1
  ---
2
- license: mit
3
- language:
4
- - zh
5
- - en
6
- datasets:
7
- - IowaCat/page_type_inference_dataset
8
- metrics:
9
- - accuracy: 0.99
10
  pipeline_tag: text-generation
11
  tags:
12
- - url-classification
13
- - list-page-detection
14
- - detail-page-detection
15
- - qwen
16
- - fine-tuning
17
  - lora
18
- - url-parser
19
- widget:
20
- - text: "https://example.com/product/12345"
21
- - text: "https://example.com/category/electronics"
22
  ---
23
 
24
- # URL Page Type Classifier
25
 
26
- <div align="center">
27
 
28
- ![Model Size](https://img.shields.io/badge/Model%20Size-1.5B-blue)
29
- ![License](https://img.shields.io/badge/License-MIT-green)
30
- ![Accuracy](https://img.shields.io/badge/Accuracy-99%25-green)
31
 
32
- </div>
33
 
34
- ## 📋 概述
35
 
36
- 基于 Qwen2.5-1.5B + LoRA 微调的URL类型分类模型,用于判断URL是列表页还是详情页。
37
 
38
- ## 🏗️ 模型架构
39
 
40
- | 项目 | 详情 |
41
- |------|------|
42
- | **基础模型** | Qwen/Qwen2.5-1.5B |
43
- | **微调方法** | LoRA (r=16, alpha=32) |
44
- | **参数量** | 1.5B |
45
- | **可训练参数** | ~18M (1.18%) |
46
 
47
- ## 📊 训练数据
48
 
49
- - **数据集**: IowaCat/page_type_inference_dataset
50
- - **训练样本**: 10,000条URL (5000列表页 + 5000详情页)
51
- - **数据来源**: HuggingFace Datasets
 
 
 
 
52
 
53
- ### 数据分布
54
 
55
- | 类型 | 数量 | 比例 |
56
- |------|------|------|
57
- | 列表页 (List Page) | 5,000 | 50% |
58
- | 详情页 (Detail Page) | 5,000 | 50% |
59
 
60
- ## ⚙️ 训练配置
 
 
61
 
62
- ```python
63
- {
64
- "base_model": "Qwen/Qwen2.5-1.5B",
65
- "lora_rank": 16,
66
- "lora_alpha": 32,
67
- "lora_dropout": 0.05,
68
- "num_train_epochs": 3,
69
- "per_device_train_batch_size": 2,
70
- "gradient_accumulation_steps": 8,
71
- "learning_rate": 2e-4,
72
- "fp16": true,
73
- "optimizer": "adamw_torch",
74
- "lr_scheduler_type": "cosine"
75
- }
76
- ```
77
 
78
- ## 📈 性能评估
79
 
80
- ### 测试结果
81
 
82
- | 测试集 | 样本数 | 准确率 |
83
- |--------|--------|--------|
84
- | 验证集 | 100 | **99%** |
85
 
86
- ### 示例预测
87
 
88
- | URL | 预测结果 |
89
- |-----|----------|
90
- | `https://example.com/products/category` | 列表页 (List Page) |
91
- | `https://example.com/product/12345` | 详情页 (Detail Page) |
92
- | `https://example.com/search?q=test` | 列表页 (List Page) |
93
- | `https://example.com/item/abc123` | 详情页 (Detail Page) |
94
- | `https://example.com/list/all` | 列表页 (List Page) |
95
 
96
- ## 🚀 快速开始
97
 
98
- ### 安装依赖
99
 
100
- ```bash
101
- pip install transformers peft torch
102
- ```
103
 
104
- ### 推理代码
105
 
106
- ```python
107
- from transformers import AutoTokenizer, AutoModelForCausalLM
108
 
109
- model_name = "windlx/url-classifier-model"
110
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
111
- model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
112
 
113
- # 要分类的URL
114
- url = "https://example.com/product/12345"
115
 
116
- # 构建提示
117
- prompt = f"""请判断以下URL是列表页还是详情页。
118
 
119
- URL: {url}
120
- 类型: """
121
 
122
- # 推理
123
- inputs = tokenizer(prompt, return_tensors="pt")
124
- outputs = model.generate(**inputs, max_new_tokens=10, do_sample=False)
125
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
126
 
127
- # 提取结果
128
- if "详情页" in response or "Detail Page" in response:
129
- result = "详情页 (Detail Page)"
130
- else:
131
- result = "列表页 (List Page)"
132
 
133
- print(f"URL: {url}")
134
- print(f"类型: {result}")
135
- ```
136
 
137
- ### 使用 GPU
138
 
139
- ```python
140
- # 自动使用GPU
141
- model = AutoModelForCausalLM.from_pretrained(
142
- model_name,
143
- trust_remote_code=True,
144
- device_map="auto",
145
- torch_dtype="auto"
146
- )
147
- ```
148
 
149
- ### 使用 CPU
150
 
151
- ```python
152
- # 强制使用CPU
153
- model = AutoModelForCausalLM.from_pretrained(
154
- model_name,
155
- trust_remote_code=True,
156
- device_map="cpu",
157
- torch_dtype="float32"
158
- )
159
- ```
160
 
161
- ## ⚠️ 局限性
162
 
163
- 1. **仅基于URL字符串** - 不访问实际网页内容
164
- 2. **依赖URL路径规范** - 对于URL路径不规范的网站,准确率可能较低
165
- 3. **仅支持中英文** - 主要针对中文URL优化
166
 
167
- ## 📝 使用场景
168
 
169
- - 🔍 **搜索引擎优化 (SEO)** - 识别网站页面结构
170
- - 🕷️ **网页爬虫** - 判断链接类型,优化爬取策略
171
- - 📊 **网站分析** - 统计列表页和详情页比例
172
- - 🔗 **链接分类** - 大规模URL分类处理
173
 
174
- ## 📁 相关链接
175
 
176
- - **GitHub仓库**: https://github.com/xiuxiu/url-classifier
177
- - **HuggingFace模型**: https://huggingface.co/windlx/url-classifier-model
178
- - **训练数据集**: https://huggingface.co/datasets/IowaCat/page_type_inference_dataset
179
 
180
- ## 🙏 致谢
181
 
182
- - [Qwen](https://github.com/QwenLM/Qwen2) - 提供基础模型
183
- - [LoRA](https://github.com/microsoft/LoRA) - 高效微调方法
184
- - [HuggingFace](https://huggingface.co/) - 模型托管平台
185
 
186
- ## 📄 许可
187
 
188
- [LICENSE](LICENSE) - MIT License
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: Qwen/Qwen2.5-1.5B
3
+ library_name: peft
 
 
 
 
 
 
4
  pipeline_tag: text-generation
5
  tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-1.5B
 
 
 
 
7
  - lora
8
+ - transformers
 
 
 
9
  ---
10
 
11
+ # Model Card for Model ID
12
 
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
 
 
 
 
15
 
 
16
 
17
+ ## Model Details
18
 
19
+ ### Model Description
20
 
21
+ <!-- Provide a longer summary of what this model is. -->
22
 
 
 
 
 
 
 
23
 
 
24
 
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
 
33
+ ### Model Sources [optional]
34
 
35
+ <!-- Provide the basic links for the model. -->
 
 
 
36
 
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
 
41
+ ## Uses
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
 
45
+ ### Direct Use
46
 
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
48
 
49
+ [More Information Needed]
50
 
51
+ ### Downstream Use [optional]
 
 
 
 
 
 
52
 
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
 
55
+ [More Information Needed]
56
 
57
+ ### Out-of-Scope Use
 
 
58
 
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
 
61
+ [More Information Needed]
 
62
 
63
+ ## Bias, Risks, and Limitations
 
 
64
 
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
66
 
67
+ [More Information Needed]
 
68
 
69
+ ### Recommendations
 
70
 
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
 
72
 
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
 
 
74
 
75
+ ## How to Get Started with the Model
 
 
76
 
77
+ Use the code below to get started with the model.
78
 
79
+ [More Information Needed]
 
 
 
 
 
 
 
 
80
 
81
+ ## Training Details
82
 
83
+ ### Training Data
 
 
 
 
 
 
 
 
84
 
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
 
87
+ [More Information Needed]
 
 
88
 
89
+ ### Training Procedure
90
 
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
92
 
93
+ #### Preprocessing [optional]
94
 
95
+ [More Information Needed]
 
 
96
 
 
97
 
98
+ #### Training Hyperparameters
 
 
99
 
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
 
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.17.1
adapter_config.json CHANGED
@@ -25,13 +25,13 @@
25
  "rank_pattern": {},
26
  "revision": null,
27
  "target_modules": [
28
- "v_proj",
29
  "k_proj",
30
- "down_proj",
31
  "q_proj",
32
  "o_proj",
 
33
  "gate_proj",
34
- "up_proj"
35
  ],
36
  "target_parameters": null,
37
  "task_type": "CAUSAL_LM",
 
25
  "rank_pattern": {},
26
  "revision": null,
27
  "target_modules": [
 
28
  "k_proj",
29
+ "up_proj",
30
  "q_proj",
31
  "o_proj",
32
+ "v_proj",
33
  "gate_proj",
34
+ "down_proj"
35
  ],
36
  "target_parameters": null,
37
  "task_type": "CAUSAL_LM",
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a4acdea42620010d2624c8891ffed23957ea2f954f52655777dccd43b9630af8
3
  size 73911112
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6471e71b043d9d5e44fc81e39173c9803dc98c6be2fa5a55d540cc1f1575db0c
3
  size 73911112
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:237a47a630887bed620ce8be916d7e98f376b70d5443cefd302892184065113d
3
+ size 148047722
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:726d0d4bb83272af9b45ee5c326944b754813bd1cbcb101424c851a7cad990d7
3
+ size 14244
scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c852b1c266b81ad33622067d7dadb27814ab6638ea8e4f9cd67d3515606d043d
3
+ size 988
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75eb251ac935d5db6e2f7cdb5cf5207c336af119368fdfc4b26832ab78ba7599
3
+ size 1064
trainer_state.json ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 0.24,
6
+ "eval_steps": 500,
7
+ "global_step": 300,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.016,
14
+ "grad_norm": 1.1355500221252441,
15
+ "learning_rate": 0.000199392,
16
+ "loss": 2.2663,
17
+ "step": 20
18
+ },
19
+ {
20
+ "epoch": 0.032,
21
+ "grad_norm": 0.7390090227127075,
22
+ "learning_rate": 0.000198752,
23
+ "loss": 1.3075,
24
+ "step": 40
25
+ },
26
+ {
27
+ "epoch": 0.048,
28
+ "grad_norm": 0.8225588798522949,
29
+ "learning_rate": 0.00019811200000000002,
30
+ "loss": 1.2692,
31
+ "step": 60
32
+ },
33
+ {
34
+ "epoch": 0.064,
35
+ "grad_norm": 0.39007440209388733,
36
+ "learning_rate": 0.00019747200000000002,
37
+ "loss": 1.2632,
38
+ "step": 80
39
+ },
40
+ {
41
+ "epoch": 0.08,
42
+ "grad_norm": 0.5560928583145142,
43
+ "learning_rate": 0.00019683200000000003,
44
+ "loss": 1.2401,
45
+ "step": 100
46
+ },
47
+ {
48
+ "epoch": 0.096,
49
+ "grad_norm": 0.6265106797218323,
50
+ "learning_rate": 0.000196192,
51
+ "loss": 1.2026,
52
+ "step": 120
53
+ },
54
+ {
55
+ "epoch": 0.112,
56
+ "grad_norm": 0.41324910521507263,
57
+ "learning_rate": 0.00019555200000000001,
58
+ "loss": 1.2117,
59
+ "step": 140
60
+ },
61
+ {
62
+ "epoch": 0.128,
63
+ "grad_norm": 0.6018273234367371,
64
+ "learning_rate": 0.000194912,
65
+ "loss": 1.243,
66
+ "step": 160
67
+ },
68
+ {
69
+ "epoch": 0.144,
70
+ "grad_norm": 0.7157287001609802,
71
+ "learning_rate": 0.000194272,
72
+ "loss": 1.2232,
73
+ "step": 180
74
+ },
75
+ {
76
+ "epoch": 0.16,
77
+ "grad_norm": 0.318360835313797,
78
+ "learning_rate": 0.000193632,
79
+ "loss": 1.2553,
80
+ "step": 200
81
+ },
82
+ {
83
+ "epoch": 0.176,
84
+ "grad_norm": 0.37768101692199707,
85
+ "learning_rate": 0.000192992,
86
+ "loss": 1.2367,
87
+ "step": 220
88
+ },
89
+ {
90
+ "epoch": 0.192,
91
+ "grad_norm": 0.28085458278656006,
92
+ "learning_rate": 0.000192352,
93
+ "loss": 1.228,
94
+ "step": 240
95
+ },
96
+ {
97
+ "epoch": 0.208,
98
+ "grad_norm": 0.3715125024318695,
99
+ "learning_rate": 0.000191712,
100
+ "loss": 1.2367,
101
+ "step": 260
102
+ },
103
+ {
104
+ "epoch": 0.224,
105
+ "grad_norm": 0.5925875306129456,
106
+ "learning_rate": 0.000191072,
107
+ "loss": 1.2121,
108
+ "step": 280
109
+ },
110
+ {
111
+ "epoch": 0.24,
112
+ "grad_norm": 0.5892595648765564,
113
+ "learning_rate": 0.000190432,
114
+ "loss": 1.2138,
115
+ "step": 300
116
+ }
117
+ ],
118
+ "logging_steps": 20,
119
+ "max_steps": 6250,
120
+ "num_input_tokens_seen": 0,
121
+ "num_train_epochs": 5,
122
+ "save_steps": 100,
123
+ "stateful_callbacks": {
124
+ "TrainerControl": {
125
+ "args": {
126
+ "should_epoch_stop": false,
127
+ "should_evaluate": false,
128
+ "should_log": false,
129
+ "should_save": true,
130
+ "should_training_stop": false
131
+ },
132
+ "attributes": {}
133
+ }
134
+ },
135
+ "total_flos": 9797016276172800.0,
136
+ "train_batch_size": 2,
137
+ "trial_name": null,
138
+ "trial_params": null
139
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0ff569697c91cff7f682e3cfc25e55c28071865f1ed4367d3e61f66ff34c8dd
3
+ size 5432