Why inside `modeling_phi.py`, the output from Self Attention is not becoming the input of MLP?
#94
by
fahadh4ilyas
- opened
Usually, hidden_states
from self_attn
will become input into mlp
. But, from modeling_phi.py
, it seems that hidden_states
after input_norm
is becoming input both for self_attn
and mlp
and then added at the end. What kind of transformers implementation is that?
Hello @fahadh4ilyas !
Attention can also be applied in parallel (instead of sequential, e.g., GPT, Lllama) within regard to the MLP.
Please check GPT-J/CodeGen's implementation: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gptj/modeling_gptj.py#L311.
gugarosa
changed discussion status to
closed