Tarjimon UZS: Southern Uzbek Machine Translation Model

This repository contains an initial machine translation model for the Southern Uzbek language, developed as part of the research paper (Coming soon!).

Model details

Model Tokenizer Length Parameter Count
tarjimon-uzs 256,204 615M

Common attributes:

  • Base Model: nllb-200-600M
  • Languages: Southern Uzbek, Northern Uzbek, English

Intended uses & limitations

These models are designed for machine translation tasks involving the Southern Uzbek language. They can be used for translation between Southern Uzbek, Uzbek, or English.

How to use

You can use these models with the Transformers library. Here's a quick example:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_ckpt = "tahrirchi/tarjimon-uzs"

tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
model = AutoModelForSeq2SeqLM.from_pretrained(model_ckpt)

# Example translation
input_text = "O'zbekiston kelajagi buyuk davlatdir."

tokenizer.src_lang = "uzn_Latn"
tokenizer.tgt_lang = "uzs_Arab"

inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated_text) # اۉزبېکستان کېلهجگی بویوک دولت دیر.

Contacts

We believe that this work will enable and inspire all enthusiasts around the world to open the hidden beauty of low-resource languages, in particular Southern Uzbek.

For further development and issues about the dataset, please use [email protected] or [email protected] to contact.

Downloads last month
9
Safetensors
Model size
615M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tahrirchi/lutfiy

Finetuned
(177)
this model

Collection including tahrirchi/lutfiy