ty1413
/

NetZeroTarget_Classification

Text Classification

text-embeddings-inference

Model card Files Files and versions

ty1413 commited on Jul 26, 2024

Commit

bd81b64

·

verified ·

1 Parent(s): 8c4da2f

Update notes.txt

Files changed (1) hide show

notes.txt +20 -1

notes.txt CHANGED Viewed

@@ -1,3 +1,22 @@
 - keeps outputting the same label regardless of input
-- should try text generation task instead

+Description:
+- trained on text classification of type of net zero target. Text is from company ESG reports, data is labelled by Net Zero Tracker.
+- text was truncated to 128 tokens before tokenization.
+-
+Problems:
 - keeps outputting the same label regardless of input
+- The text column is quite unstructured, varies in lenghth, some include/don't include URL, some include excerpts from ESG report, etc...
+- truncation might have resulted in loss of data
+- should try text generation task instead
+- too many labels makes model behave poorly.
+Moving Forward:
+- better text preprocessing, remove urls, etc...
+- change task to text generation. Might perform better (This means ClimateBert cannot be used as base model.)
+-