Jina Clip V2 Encode_image latency is proportional to batch size

#33
by lssatvik - opened

I am not able to observe gpu parallelism benefits with batch size when trying to encode images to retrieve embeddings.

There is a batch size param in encode_image method. But time to process scales linearly with batch size. No point in supplying batch size param itself as Increasing batch also increases gpu memory consumed.

1 of Batch 1: 't' ms, 'm' additional gpu memory
10 of Batch 1: '10t' ms, 'm' additional gpu memory
1 of Batch 10: '10
t' ms, 'm1.2' additional gpu memory
1 of Batch 90: '90
t' ms, 'm*4' additional gpu memory

-> '10 of Batch 1' better than '1 of Batch 10'

Is this expected? Why is bigger batch size taking proportional time? Shouldn't I see some proportional reduction in time? Which aspect of the model governs this? Size of matrix? Num gpu cores?

Hey @lssatvik , thanks for reaching out! Can you share a code snippet, so I can reproduce this?

Jina AI org

I couldnt reproduce this, it would help to know how you are using the model cause overhead might be coming from downloading or transforming the images

Sign up or log in to comment