fastText
German
bastitx commited on
Commit
003704e
·
1 Parent(s): a51c030

add Aleph-Alpha-GermanWeb-Quality-Classifier-fastText model

Browse files
Files changed (2) hide show
  1. README.md +31 -0
  2. model.bin +3 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ library_name: fasttext
5
+ ---
6
+ # Aleph-Alpha-GermanWeb-Quality-Classifier-fastText
7
+
8
+ ## Example Snippet
9
+
10
+ ```python
11
+ import fasttext
12
+ from huggingface_hub import hf_hub_download
13
+
14
+
15
+ model_path = hf_hub_download(repo_id="Aleph-Alpha/Aleph-Alpha-Quality-Classifier-fastText", filename="model.bin")
16
+ model = fasttext.load_model(model_path)
17
+
18
+ text = "Das ist ein Beispieltext, um die Qualität zu überprüfen."
19
+
20
+ pre_processed_document = text.replace("\n", " ")
21
+
22
+ predicted_class, prob = model.predict(pre_processed_document)
23
+ predicted_label = predicted_class[0].replace("__label__", "")
24
+ document_score = prob[0]
25
+ # similar to https://github.com/NVIDIA/NeMo-Curator/blob/31c8171434205e62f6a7d38565ffd9cb4c2806b7/nemo_curator/filters/classifier_filter.py#L47 , the document score is defined as the probability of the predicted class is the predicted label is 'high quality', otherwise it is 1 - document_score
26
+
27
+ if predicted_label != "high_quality":
28
+ document_score = 1 - document_score
29
+
30
+ print(predicted_label, document_score)
31
+ ```
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5983edf3f6988b199654a6f39489b72df65800eabe135986ed35cd24f758523f
3
+ size 2471945221