Ask about the M-RoPE

#25
by JavenChen - opened

After I read the code of the implementation of the M-RoPE. Especially the code here:

image.png

Do you guys only use hight and width for the rotary position embedding?

The spatial position of the patch at (1,1) in frame 0 and the spatial position of the patch at (1,1) in frame 1—when calculating attention, is their encoded relative distance 0?

My mistake. This rotary_pos_emb is for Vision model. It make sense now!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment