Batch inference on GPU

#3
by Pavelrst - opened

Hey, quick question - is this model supposed to run faster on GPU when batch_size > 1?
I've tried to run it with batch_size = 2,4,8 and measure the time of .forward but always got slightly slower inference.
Any idea why?

Sign up or log in to comment