Text Classification
fastText
nielsr HF Staff commited on
Commit
95acfd7
·
verified ·
1 Parent(s): c8b570c

Improve model card with metadata and link to code

Browse files

This PR adds metadata to the model card, including the `pipeline_tag` and `library_name`. It also adds a link to the Github repository for easier access to the code.

Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -1,6 +1,13 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
- This is the fastText pretraining data filter targeting
5
- the SciQ task, discussed in the main text of the Perplexity
6
- Correlations paper: https://arxiv.org/abs/2409.05816
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: fasttext
4
+ pipeline_tag: text-classification
5
  ---
6
+
7
+ This is the fastText pretraining data filter targeting the SciQ task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816
8
+
9
+ This package can be used to get LLM pretraining data sampling distributions using simple statistical methods. The compute requirements are minimal, and you don't need to train any LLMs yourself.
10
+
11
+ Essentially, this approach encourages training on domains where lower loss is very correlated with higher downstream performance. We can use existing and freely available LLMs to do this.
12
+
13
+ Code: https://github.com/TristanThrush/perplexity-correlations