README.md · saiful9379/Bangla_GPT2 at 371d540af38b78726f5669d0d04001e8d2a391cd

metadata

license: mit
widget:
  - বহুল আলোচিত দশম জাতীয় সংসদ
  - গাজীপুরের কালিয়াকৈর উপজেলার তেলিরচালা

Bangla GPT2 model was trained using the Bangla Newspaper dataset. Here we used prothom algo 250mb data for GPT2 model training and also vocab size 50k.

from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

tokenizer =  GPT2Tokenizer.from_pretrained("saiful9379/Bangla_GPT2")

model = TFGPT2LMHeadModel.from_pretrained("saiful9379/Bangla_GPT2")
text = "বহুল আলোচিত দশম জাতীয় সংসদ"
input_ids = tokenizer.encode(text, return_tensors='tf')

print(input_ids)


output = model.generate(
        input_ids,
        max_length=175,
        num_beams=10,
        temperature=0.7,
        no_repeat_ngram_size=2,
        num_return_sequences=5
    )
predicted_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(predicted_text)