luffycodes
/

parallel-roberta-large

Model card Files Files and versions Community

luffycodes commited on Jun 8, 2023

Commit

8b85a80

·

1 Parent(s): 3baa823

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -40,7 +40,7 @@ However, I could not efficiently optimize the second feedforward network sub-lay
 *On the left is the standard Series Attention and Feed-Forward Net Design (SAF) for transformers models. On the right is the Parallel Attention and Feed-Forward Net Design (PAF) used in transformer models like PaLM (Chowdhery et al., 2022) and Mesh-Transformers (Wang, 2021)*
-## Evaluation results of [PAF-RoBERTa-Large](https://huggingface.co/luffycodes/parallel-roberta-large).
 When fine-tuned on downstream tasks, this model achieves the following results:

 *On the left is the standard Series Attention and Feed-Forward Net Design (SAF) for transformers models. On the right is the Parallel Attention and Feed-Forward Net Design (PAF) used in transformer models like PaLM (Chowdhery et al., 2022) and Mesh-Transformers (Wang, 2021)*
+## Evaluation results of [PAF-RoBERTa-Large](https://huggingface.co/luffycodes/parallel-roberta-large)
 When fine-tuned on downstream tasks, this model achieves the following results: