Why inside `modeling_phi.py`, the output from Self Attention is not becoming the input of MLP?

#94

by fahadh4ilyas - opened Jan 22

Jan 22

Usually, hidden_states from self_attn will become input into mlp. But, from modeling_phi.py, it seems that hidden_states after input_norm is becoming input both for self_attn and mlp and then added at the end. What kind of transformers implementation is that?

gugarosa

Microsoft org Jan 22

Hello @fahadh4ilyas !

Attention can also be applied in parallel (instead of sequential, e.g., GPT, Lllama) within regard to the MLP.

Please check GPT-J/CodeGen's implementation: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gptj/modeling_gptj.py#L311.

gugarosa changed discussion status to closed Jan 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment