jinaai
/

jina-clip-v2

Model card Files Files and versions Community

Jina Clip V2: Inconsistent Embeddings

#32

by lssatvik - opened Jan 21

Jan 21

A concerning observation: Changing Jina embedding batch size slightly varies the embedding.
emb(A): a1
emb([A,A]): [a2,a2]
emb([A,A,A]): [a3,a3,a3]
emb([A,B]): [a4, b4]

a1, a2, a3, a4 are very similar, but not the same. cosine similarities of a1, a2, a3, a4 are close to 1.

This happens with/without xformers.
I dont think attention causes this, but does anyone have an idea? Is this expected?

lssatvik

Jan 21

The embeddings compared are full 1024 size and not normalized .

gmastrapas

Jina AI org Jan 21

Hey @lssatvik , I need to take a closer look! Can you share a code snippet, so I can reproduce this?

lssatvik

Jan 21

from transformers import AutoModel
import torch

model = AutoModel.from_pretrained("jinaai/jina-clip-v2", trust_remote_code=True).to("cpu")
img = "https://www.shutterstock.com/image-photo/beautiful-sunset-wave-vibrant-translucent-600nw-2255651435.jpg"

x = model.encode_image([img], batch_size=1, convert_to_numpy=False, normalize_embeddings=False)
y = model.encode_image([img, img], batch_size=2, convert_to_numpy=False, normalize_embeddings=False)

print(torch.equal(x[0], y[0]))

lssatvik

Jan 21

transformers==4.47.1
torch==2.5.1

Top level versions, incase it was an issue with dependencies.

gmastrapas

Jina AI org Jan 27

•

edited Jan 27

Hey @lssatvik after taking a closer look, I would say this is expected. The model weights are in bf16, so when moving to cpu they are casted to fp32 and this causes some slight variations in the embeddings when batch size is changing. If you run on bf16 or fp16 the embeddings should be identical

lssatvik

Jan 27

But @gmastrapas , My understanding is conversion of bf16 to fp32 is done on vector level, not batch level. Even if there are variations; the variations would not be dependent on other vectors no?
bf_16_to_fp_32([x,x]) must be equal to [bf_16_to_fp_32(x), bf_16_to_fp_32(x)].

In my sample code, I haven't converted them to fp32. They remain n bf16 tensor form stored on cuda. They are not same at this stage itself. So conversion couldn't be the cause.

gmastrapas

Jina AI org Jan 27

In the sample code, you are moving the model to the cpu. BFloat16 and Float16 are not supported on CPU, so when moving to CPU the model weights automatically go to Float32

lssatvik

Jan 27

a = torch.randn(1,512, dtype=torch.bfloat16)
b = torch.randn(1,512, dtype=torch.bfloat16)

x = a.to(torch.float32)
y = b.to(torch.float32)
z = torch.concat((a,b)).to(torch.float32)

torch.equal(x.flatten(), z[0].flatten()), torch.equal(y.flatten(), z[1].flatten())

I can't see batch size causing difference in the output here. Could you share some resource/code that demonstrates this phenomenon for different output based on batch size?

gmastrapas changed discussion status to closed Jan 27

gmastrapas changed discussion status to open Jan 27

lssatvik

Jan 27

Didn't see your earlier comment. I tested on gpu entirely. cuda==12.4

It's still unequal.

lssatvik

Jan 27

•

edited Jan 27

In initial sample code, model ran in cpu agreed, but predictions were done afterwards. All predictions are happening in fp32 and this still should not be dependent on batch, but solely the vector.

gmastrapas

Jina AI org Jan 27

Can you check your model.dtype when on CUDA?

lssatvik

Jan 27

torch.bfloat16

gmastrapas

Jina AI org Jan 27

That is a bit strange, for me in bf16 and fp16 the tensors are equal, and there is a max abs difference of tensor(4.2915e-06, device='cuda:0') when using fp32

gmastrapas

Jina AI org Jan 27

What GPU are you using? Are you using xformers for the image encoder? Can you calculate the max absolute difference across dtypes using torch.abs(x[0] - y[0]).max()

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment