Spaces:

crpatel
/

Gujarati-BPE-Tokenizer

Sleeping

crpatel commited on Jan 6

Commit

56a0cfd

1 Parent(s): 4656a31

vocab corpus increased

Files changed (1) hide show

app.py CHANGED Viewed

@@ -13,7 +13,7 @@ class DecodeRequest(BaseModel):
     tokens: str
 # Initialize the tokenizer
-tokenizer = BPEGujaratiTokenizer(corpus_path="gu_corpus.txt", max_vocab_size=5000, sample_size=275000)
 app = FastAPI()

     tokens: str
 # Initialize the tokenizer
+tokenizer = BPEGujaratiTokenizer(corpus_path="gu_corpus.txt", max_vocab_size=5000, sample_size=50000)
 app = FastAPI()