Change example and fix preprocessing script link
Browse files
README.md
CHANGED
|
@@ -26,10 +26,12 @@ ner_pipeline = pipeline("token-classification",
|
|
| 26 |
aggregation_strategy="max")
|
| 27 |
|
| 28 |
# Apply it to some text
|
| 29 |
-
ner_pipeline("
|
| 30 |
|
| 31 |
# Output:
|
| 32 |
-
# [ {"entity_group": "
|
|
|
|
|
|
|
| 33 |
```
|
| 34 |
|
| 35 |
## Dataset Info
|
|
@@ -38,7 +40,7 @@ ner_pipeline("EGFR T790M mutations have been known to affect treatment outcomes
|
|
| 38 |
|
| 39 |
The dataset should be cited with: Wei, Chih-Hsuan, Hung-Yu Kao, and Zhiyong Lu. "GNormPlus: an integrative approach for tagging genes, gene families, and protein domains." BioMed research international 2015.1 (2015): 918710. DOI: [10.1155/2015/918710](https://doi.org/10.1155/2015/918710)
|
| 40 |
|
| 41 |
-
**Preprocessing:** The training set was split 75/25 to create a training and validation set. No changes were made to the annotations. The preprocessing script for this dataset is [
|
| 42 |
|
| 43 |
## Performance
|
| 44 |
|
|
|
|
| 26 |
aggregation_strategy="max")
|
| 27 |
|
| 28 |
# Apply it to some text
|
| 29 |
+
ner_pipeline("ZNF598 is a Zinc finger containing E3 ubiquitin ligase.")
|
| 30 |
|
| 31 |
# Output:
|
| 32 |
+
# [ {"entity_group": "Gene", "score": 0.99889, "word": "znf598", "start": 0, "end": 6},
|
| 33 |
+
# {"entity_group": "DomainMotif", "score": 0.74961, "word": "zinc finger", "start": 12, "end": 23},
|
| 34 |
+
# {"entity_group": "FamilyName", "score": 0.89084, "word": "e3 ubiquitin ligase", "start": 35, "end": 54} ]
|
| 35 |
```
|
| 36 |
|
| 37 |
## Dataset Info
|
|
|
|
| 40 |
|
| 41 |
The dataset should be cited with: Wei, Chih-Hsuan, Hung-Yu Kao, and Zhiyong Lu. "GNormPlus: an integrative approach for tagging genes, gene families, and protein domains." BioMed research international 2015.1 (2015): 918710. DOI: [10.1155/2015/918710](https://doi.org/10.1155/2015/918710)
|
| 42 |
|
| 43 |
+
**Preprocessing:** The training set was split 75/25 to create a training and validation set. No changes were made to the annotations. The preprocessing script for this dataset is [prepare_gnormplus.py](https://github.com/Glasgow-AI4BioMed/bioner/blob/main/prepare_gnormplus.py).
|
| 44 |
|
| 45 |
## Performance
|
| 46 |
|