100% gpu memory usage
Hello,
First of all thank you guys for making this repo and all of your hard works.
I was using this model to train a Seq2SeqTrainer and hitting the memory limit (3060 12gb), which was not the case for t5 and mt5. I have been using the same training args for all of the cases except the batch_size = 64/48/32 but for banglat5 I had to set the batch_size = 16,
Is there anyway to optimize the gpu memory usage?
Thank you,
Possibly a tokenization issue since banglat5 has the exact same architecture as t5. In fact, banglat5 should have lower memory requirements because the banglat5 tokenizer creates a lower number of tokens than mt5 given the same Bangla text.
Check if you're using the right tokenizer (the one in this repo)
Maybe worthwhile to explicitly set max_length, truncation, and padding variables when calling the tokenizer.
Good Luck.