Ask about the M-RoPE
#25
by
JavenChen
- opened
After I read the code of the implementation of the M-RoPE. Especially the code here:
Do you guys only use hight and width for the rotary position embedding?
The spatial position of the patch at (1,1) in frame 0 and the spatial position of the patch at (1,1) in frame 1—when calculating attention, is their encoded relative distance 0?
My mistake. This rotary_pos_emb is for Vision model. It make sense now!