Repeating tokens - using Assamese language
So I am facing an issue. It seems that the model is repeating its tokens while translating. I am translating from English->Assamese, and I am facing this issue for certain texts. I know that heavy context lengths can cause this issue, hence I made sure each of my English text sequences is within 300 tokens. I have tried (and wasted compute :)) multiple model param settings, but to no avail. Below is code implementation. Rest assured that the text input is below 300 tokens (acc. to spaCy).
def translate_text_sarvam(text, target_language="Assamese"):
"""Translate text using Sarvam-Translate model on GPU"""
global SARVAM_MODEL, SARVAM_TOKENIZER
# Models should already be loaded by the calling function
if SARVAM_MODEL is None or SARVAM_TOKENIZER is None:
SARVAM_MODEL, SARVAM_TOKENIZER = load_models()
try:
messages = [
{
"role": "system",
"content": (
f"You are a professional translator. Translate the following text to {target_language}. "
"Please do not repeat the same word or phrase multiple times."
)
},
{"role": "user", "content": text}
]
text = SARVAM_TOKENIZER.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = SARVAM_TOKENIZER([text], return_tensors="pt").to(SARVAM_MODEL.device)
input_tokens_size = model_inputs.input_ids.shape[1]
print(f"Input tokens size: {input_tokens_size}")
MAX_CONTEXT_SIZE = 8000
max_new_tokens = MAX_CONTEXT_SIZE - input_tokens_size
if max_new_tokens < 0:
raise ValueError("Inputs are too long for the model. Please shorten the input text.")
max_factor = 3
with torch.no_grad():
generated_ids = SARVAM_MODEL.generate(
**model_inputs,
do_sample=True,
temperature=0.01,
num_return_sequences=1,
max_new_tokens=input_tokens_size*max_factor
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
output_tokens_size = len(output_ids)
print(f"Output tokens size: {output_tokens_size}")
translated_text = SARVAM_TOKENIZER.decode(output_ids, skip_special_tokens=True)
if output_tokens_size == input_tokens_size*max_factor:
#Repeating token case
with open("translated.txt", "w", encoding="utf-8") as f:
f.write(translated_text)
return translated_text
except Exception as e:
print(f"Error in Sarvam translation: {e}")
return ""
does this model translate from regional to other langauges i was trying malyalam to hindi but it gives me english in output through the code provided on the page itself, any idea how to do regional languages to hindi
I think it does coz there is a "source_lang_code" parameter in the Sarvam API template. check the model card once.
Or if not possible through one prompt, then maybe convert the translated english lang to the other regional language as the next step. Will double the compute usage tho :(
i have a 16gb 5060ti not sure if its enough i have 3 other models already running on that GPU cant Double the compute usage.
Hi
@ColdMeat2003
. Your code looks good. Can you please share the input text for which the repetition issue is happening?
We can debug and get back.
Also BTW, having just this instruction in the system prompt should be sufficient: Translate the following text to {target_language}.
These additional instructions may not add much value: You are a professional translator. Please do not repeat the same word or phrase multiple times.
(because the model was not explicitly trained on such instructions)
Hi
@iamgrootns
, the model is currently not trained for Indic to Indic translation.
It is trained only English->Indic and Indic->English.
Until we release a new version supporting any language to any language translation, please do Malayalam->English and then English->Hindi as suggested by
@ColdMeat2003
.
This will not double the memory (since you'll just be using the same model), but yes, it will increase the compute time for 2 model calls.
same happens on LMStudio.
hey @GokulNC What about the IndicTrans3Beta , is it indic to indic? There was not much to go on the page but i tried and tested it from malyalam to hindi using gradio it gives me this as output
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>
<|assistant|>import torch
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizerHugging Face token
HF_TOKEN = ""
Load model and tokenizer
model_id = "ai4bharat/IndicTrans3-beta"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float32, # avoid float16 to disable Triton
device_map="cpu", # force CPU if CUDA/Triton is unstable
token=HF_TOKEN
)tokenizer = AutoTokenizer.from_pretrained(model_id)
Translation helper
def translate_to_hindi(text: str) -> str:
# Format the prompt
prompt = f"Translate the following text to Hindi: {text}"
conversation = [{"role": "user", "content": prompt}]# Tokenize using chat template input_ids = tokenizer.apply_chat_template( conversation, return_tensors="pt", add_generation_prompt=True ).to(model.device) # Trim if too long if input_ids.shape[1] > 4096: input_ids = input_ids[:, -4096:] # Generate translation output_ids = model.generate( input_ids=input_ids, max_new_tokens=512, do_sample=False, num_beams=1, repetition_penalty=1.1, ) # Decode output translated = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True) return translated.strip()
Gradio interface
demo = gr.Interface(
fn=translate_to_hindi,
inputs=gr.Textbox(label="Enter Regional Language Text"),
outputs=gr.Textbox(label="Translation in Hindi"),
title="Regional to Hindi Translator (IndicTrans3-beta)"
)if name == "main":
demo.launch(debug=True)A script like this