Batch inference

#4
by Pavelrst - opened

Hey, quick question - is this model supposed to run faster on GPU when batch_size > 1?
I've tried to run it with batch_size = 2,4,8 and measure the time of .forward but always got slightly slower inference.
Any idea why?

i don't know to be honest, depends on your gpu vram i guess

Sign up or log in to comment