why are you using query/key layer norm AFTER rotary
#11
by
vince62s
- opened
My understanding is that standard practice is to LN before .... if we want to use flash decoding with the specific flash kernel then it is an issue because rotary is embedded in the kernel.
mango
θζ
mango.
mango.mango.mango.