lutfiy release
Collection
lutfiy: Southern Uzbek Machine Translation
•
2 items
•
Updated
•
2
This repository contains an initial machine translation model for the Southern Uzbek language, developed as part of the research paper (Coming soon!).
Model | Tokenizer Length | Parameter Count |
---|---|---|
tarjimon-uzs |
256,204 | 615M |
Common attributes:
These models are designed for machine translation tasks involving the Southern Uzbek language. They can be used for translation between Southern Uzbek, Uzbek, or English.
You can use these models with the Transformers library. Here's a quick example:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_ckpt = "tahrirchi/tarjimon-uzs"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSeq2SeqLM.from_pretrained(model_ckpt)
# Example translation
input_text = "O'zbekiston kelajagi buyuk davlatdir."
tokenizer.src_lang = "uzn_Latn"
tokenizer.tgt_lang = "uzs_Arab"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated_text) # اۉزبېکستان کېلهجگی بویوک دولت دیر.
We believe that this work will enable and inspire all enthusiasts around the world to open the hidden beauty of low-resource languages, in particular Southern Uzbek.
For further development and issues about the dataset, please use [email protected] or [email protected] to contact.
Base model
facebook/nllb-200-distilled-600M