Update notes.txt
Browse files
notes.txt
CHANGED
|
@@ -1,3 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
- keeps outputting the same label regardless of input
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
| 1 |
+
Description:
|
| 2 |
+
|
| 3 |
+
- trained on text classification of type of net zero target. Text is from company ESG reports, data is labelled by Net Zero Tracker.
|
| 4 |
+
- text was truncated to 128 tokens before tokenization.
|
| 5 |
+
-
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
Problems:
|
| 10 |
- keeps outputting the same label regardless of input
|
| 11 |
+
- The text column is quite unstructured, varies in lenghth, some include/don't include URL, some include excerpts from ESG report, etc...
|
| 12 |
+
- truncation might have resulted in loss of data
|
| 13 |
+
- should try text generation task instead
|
| 14 |
+
- too many labels makes model behave poorly.
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
Moving Forward:
|
| 19 |
|
| 20 |
+
- better text preprocessing, remove urls, etc...
|
| 21 |
+
- change task to text generation. Might perform better (This means ClimateBert cannot be used as base model.)
|
| 22 |
+
-
|