ty1413 commited on
Commit
bd81b64
·
verified ·
1 Parent(s): 8c4da2f

Update notes.txt

Browse files
Files changed (1) hide show
  1. notes.txt +20 -1
notes.txt CHANGED
@@ -1,3 +1,22 @@
 
 
 
 
 
 
 
 
 
1
  - keeps outputting the same label regardless of input
 
 
 
 
 
 
 
 
2
 
3
- - should try text generation task instead
 
 
 
1
+ Description:
2
+
3
+ - trained on text classification of type of net zero target. Text is from company ESG reports, data is labelled by Net Zero Tracker.
4
+ - text was truncated to 128 tokens before tokenization.
5
+ -
6
+
7
+
8
+
9
+ Problems:
10
  - keeps outputting the same label regardless of input
11
+ - The text column is quite unstructured, varies in lenghth, some include/don't include URL, some include excerpts from ESG report, etc...
12
+ - truncation might have resulted in loss of data
13
+ - should try text generation task instead
14
+ - too many labels makes model behave poorly.
15
+
16
+
17
+
18
+ Moving Forward:
19
 
20
+ - better text preprocessing, remove urls, etc...
21
+ - change task to text generation. Might perform better (This means ClimateBert cannot be used as base model.)
22
+ -