kv cache
#6
by
FrankWu
- opened
in the paper, i see the model cache Compress kv. but in the model file, it seems to still cache legacy kv:
key_states = k_pe.new_empty(bsz, self.num_heads, q_len, self.q_head_dim)
key_states[:, :, :, : self.qk_nope_head_dim] = k_nope
key_states[:, :, :, self.qk_nope_head_dim :] = k_pe
if past_key_value is not None:
cache_kwargs = {"sin": sin, "cos": cos} # Specific to RoPE models
key_states, value_states = past_key_value.update(
key_states, value_states, self.layer_idx, cache_kwargs
)
Do I misunderstand something?
I also found this problem, and the linear transformation matrices of K and V in the code are not coupled to the transformation matrices of Q and O.
+1
+1