readme: update model card
Browse files
README.md
CHANGED
|
@@ -3,10 +3,10 @@ tags:
|
|
| 3 |
- flair
|
| 4 |
- token-classification
|
| 5 |
- sequence-tagger-model
|
| 6 |
-
language:
|
| 7 |
-
- en
|
| 8 |
-
- de
|
| 9 |
-
- fr
|
| 10 |
- it
|
| 11 |
- nl
|
| 12 |
- pl
|
|
@@ -26,7 +26,7 @@ widget:
|
|
| 26 |
|
| 27 |
This is the default multilingual universal part-of-speech tagging model that ships with [Flair](https://github.com/flairNLP/flair/).
|
| 28 |
|
| 29 |
-
F1-Score: **
|
| 30 |
|
| 31 |
Predicts universal POS tags:
|
| 32 |
|
|
@@ -94,14 +94,14 @@ Token[6]: "say" → VERB (0.9998)
|
|
| 94 |
Token[7]: "." → PUNCT (1.0)
|
| 95 |
```
|
| 96 |
|
| 97 |
-
So, the words "*Ich*" and "*they*" are labeled as **pronouns** (PRON), while "*liebe*" and "*say*" are labeled as **verbs** (VERB) in the multilingual sentence "*Ich liebe Berlin, as they say*".
|
| 98 |
|
| 99 |
|
| 100 |
---
|
| 101 |
|
| 102 |
### Training: Script to train this model
|
| 103 |
|
| 104 |
-
The following Flair script was used to train this model:
|
| 105 |
|
| 106 |
```python
|
| 107 |
from flair.data import MultiCorpus
|
|
@@ -129,11 +129,10 @@ corpus = MultiCorpus([
|
|
| 129 |
tag_type = 'upos'
|
| 130 |
|
| 131 |
# 3. make the tag dictionary from the corpus
|
| 132 |
-
tag_dictionary = corpus.
|
| 133 |
|
| 134 |
# 4. initialize each embedding we use
|
| 135 |
embedding_types = [
|
| 136 |
-
|
| 137 |
# contextual string embeddings, forward
|
| 138 |
FlairEmbeddings('multi-forward'),
|
| 139 |
|
|
@@ -141,7 +140,7 @@ embedding_types = [
|
|
| 141 |
FlairEmbeddings('multi-backward'),
|
| 142 |
]
|
| 143 |
|
| 144 |
-
# embedding stack consists of Flair
|
| 145 |
embeddings = StackedEmbeddings(embeddings=embedding_types)
|
| 146 |
|
| 147 |
# 5. initialize sequence tagger
|
|
|
|
| 3 |
- flair
|
| 4 |
- token-classification
|
| 5 |
- sequence-tagger-model
|
| 6 |
+
language:
|
| 7 |
+
- en
|
| 8 |
+
- de
|
| 9 |
+
- fr
|
| 10 |
- it
|
| 11 |
- nl
|
| 12 |
- pl
|
|
|
|
| 26 |
|
| 27 |
This is the default multilingual universal part-of-speech tagging model that ships with [Flair](https://github.com/flairNLP/flair/).
|
| 28 |
|
| 29 |
+
F1-Score: **96.87** (12 UD Treebanks covering English, German, French, Italian, Dutch, Polish, Spanish, Swedish, Danish, Norwegian, Finnish and Czech)
|
| 30 |
|
| 31 |
Predicts universal POS tags:
|
| 32 |
|
|
|
|
| 94 |
Token[7]: "." → PUNCT (1.0)
|
| 95 |
```
|
| 96 |
|
| 97 |
+
So, the words "*Ich*" and "*they*" are labeled as **pronouns** (PRON), while "*liebe*" and "*say*" are labeled as **verbs** (VERB) in the multilingual sentence "*Ich liebe Berlin, as they say*".
|
| 98 |
|
| 99 |
|
| 100 |
---
|
| 101 |
|
| 102 |
### Training: Script to train this model
|
| 103 |
|
| 104 |
+
The following Flair script was used to train this model:
|
| 105 |
|
| 106 |
```python
|
| 107 |
from flair.data import MultiCorpus
|
|
|
|
| 129 |
tag_type = 'upos'
|
| 130 |
|
| 131 |
# 3. make the tag dictionary from the corpus
|
| 132 |
+
tag_dictionary = corpus.make_label_dictionary(label_type=tag_type)
|
| 133 |
|
| 134 |
# 4. initialize each embedding we use
|
| 135 |
embedding_types = [
|
|
|
|
| 136 |
# contextual string embeddings, forward
|
| 137 |
FlairEmbeddings('multi-forward'),
|
| 138 |
|
|
|
|
| 140 |
FlairEmbeddings('multi-backward'),
|
| 141 |
]
|
| 142 |
|
| 143 |
+
# embedding stack consists of Flair embeddings
|
| 144 |
embeddings = StackedEmbeddings(embeddings=embedding_types)
|
| 145 |
|
| 146 |
# 5. initialize sequence tagger
|