HPLT
/

hplt_bert_base_2_0_swh-Latn

Swahili (individual language)

Model card Files Files and versions

MariaFjodorowa commited on Apr 30

Commit

c3c464f

·

verified ·

1 Parent(s): 19e3ca9

fix language name

Files changed (1) hide show

README.md +8 -7

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 language:
-- en
 inference: false
 tags:
 - BERT
@@ -9,9 +9,10 @@ tags:
 license: apache-2.0
 datasets:
 - HPLT/HPLT2.0_cleaned
 ---
-# HPLT v2.0 for English
 <img src="https://hplt-project.org/_next/static/media/logo-hplt.d5e16ca5.svg" width=12.5%>
@@ -40,8 +41,8 @@ This model currently needs a custom wrapper from `modeling_ltgbert.py`, you shou
 import torch
 from transformers import AutoTokenizer, AutoModelForMaskedLM
-tokenizer = AutoTokenizer.from_pretrained("HPLT/hplt_bert_base_eng-Latn")
-model = AutoModelForMaskedLM.from_pretrained("HPLT/hplt_bert_base_eng-Latn", trust_remote_code=True)
 mask_id = tokenizer.convert_tokens_to_ids("[MASK]")
 input_text = tokenizer("It's a beautiful[MASK].", return_tensors="pt")
@@ -60,13 +61,13 @@ We are releasing 10 intermediate checkpoints for each model at intervals of ever
 You can load a specific model revision with `transformers` using the argument `revision`:
 ```python
-model = AutoModelForMaskedLM.from_pretrained("HPLT/hplt_bert_base_eng-Latn", revision="step21875", trust_remote_code=True)
 ```
 You can access all the revisions for the models with the following code:
 ```python
 from huggingface_hub import list_repo_refs
-out = list_repo_refs("HPLT/hplt_bert_base_eng-Latn")
 print([b.name for b in out.branches])
 ```
@@ -102,4 +103,4 @@ print([b.name for b in out.branches])
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2503.10267},
 }
-```

 ---
 language:
+- sw
 inference: false
 tags:
 - BERT
 license: apache-2.0
 datasets:
 - HPLT/HPLT2.0_cleaned
+pipeline_tag: fill-mask
 ---
+# HPLT v2.0 for Swahili
 <img src="https://hplt-project.org/_next/static/media/logo-hplt.d5e16ca5.svg" width=12.5%>
 import torch
 from transformers import AutoTokenizer, AutoModelForMaskedLM
+tokenizer = AutoTokenizer.from_pretrained("HPLT/hplt_bert_base_swh-Latn")
+model = AutoModelForMaskedLM.from_pretrained("HPLT/hplt_bert_base_swh-Latn", trust_remote_code=True)
 mask_id = tokenizer.convert_tokens_to_ids("[MASK]")
 input_text = tokenizer("It's a beautiful[MASK].", return_tensors="pt")
 You can load a specific model revision with `transformers` using the argument `revision`:
 ```python
+model = AutoModelForMaskedLM.from_pretrained("HPLT/hplt_bert_base_swh-Latn", revision="step21875", trust_remote_code=True)
 ```
 You can access all the revisions for the models with the following code:
 ```python
 from huggingface_hub import list_repo_refs
+out = list_repo_refs("HPLT/hplt_bert_base_swh-Latn")
 print([b.name for b in out.branches])
 ```
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2503.10267},
 }
+```