Improve model card with metadata and description

This PR improves the model card by:

- Adding the `pipeline_tag: text-classification` to better reflect the model's purpose.
- Specifying the `library_name: fasttext` as the model uses fastText for filtering.
- Confirming the `license: mit`.
- Providing a more detailed description of the model and its usage.
- Adding a link to the Github repository.

This will improve discoverability and usability of this valuable data filtering resource.

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -1,6 +1,9 @@
 ---
 license: mit
 ---
-This is the fastText pretraining data filter targeting
-the LAMBADA FR task, discussed in the main text of the Perplexity
-Correlations paper: https://arxiv.org/abs/2409.05816

 ---
 license: mit
+library_name: fasttext
+pipeline_tag: text-classification
 ---
+This is the fastText pretraining data filter targeting the LAMBADA FR task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816. This filter uses perplexity correlations to identify high-quality pretraining data without requiring any LLM training. It is designed to be used with the `fastText` library.
+Github: https://github.com/TristanThrush/perplexity-correlations