100% gpu memory usage

by Sibgat-Ul - opened Oct 26, 2023

Oct 26, 2023

Hello,

First of all thank you guys for making this repo and all of your hard works.
I was using this model to train a Seq2SeqTrainer and hitting the memory limit (3060 12gb), which was not the case for t5 and mt5. I have been using the same training args for all of the cases except the batch_size = 64/48/32 but for banglat5 I had to set the batch_size = 16,

Is there anyway to optimize the gpu memory usage?

Thank you,

Lancelot53

BUET CSE NLP Group org Oct 26, 2023

Possibly a tokenization issue since banglat5 has the exact same architecture as t5. In fact, banglat5 should have lower memory requirements because the banglat5 tokenizer creates a lower number of tokens than mt5 given the same Bangla text.

Check if you're using the right tokenizer (the one in this repo)

Maybe worthwhile to explicitly set max_length, truncation, and padding variables when calling the tokenizer.

Good Luck.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment