Improve model card with metadata and description
Browse filesThis PR improves the model card by:
- Adding the `pipeline_tag: text-classification` to better reflect the model's purpose.
- Specifying the `library_name: fasttext` as the model uses fastText for filtering.
- Confirming the `license: mit`.
- Providing a more detailed description of the model and its usage.
- Adding a link to the Github repository.
This will improve discoverability and usability of this valuable data filtering resource.
README.md
CHANGED
|
@@ -1,6 +1,9 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
-
|
| 5 |
-
the LAMBADA FR task, discussed in the main text of the Perplexity
|
| 6 |
-
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
library_name: fasttext
|
| 4 |
+
pipeline_tag: text-classification
|
| 5 |
---
|
| 6 |
+
|
| 7 |
+
This is the fastText pretraining data filter targeting the LAMBADA FR task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816. This filter uses perplexity correlations to identify high-quality pretraining data without requiring any LLM training. It is designed to be used with the `fastText` library.
|
| 8 |
+
|
| 9 |
+
Github: https://github.com/TristanThrush/perplexity-correlations
|