luffycodes commited on
Commit
8b85a80
1 Parent(s): 3baa823

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -40,7 +40,7 @@ However, I could not efficiently optimize the second feedforward network sub-lay
40
 
41
  *On the left is the standard Series Attention and Feed-Forward Net Design (SAF) for transformers models. On the right is the Parallel Attention and Feed-Forward Net Design (PAF) used in transformer models like PaLM (Chowdhery et al., 2022) and Mesh-Transformers (Wang, 2021)*
42
 
43
- ## Evaluation results of [PAF-RoBERTa-Large](https://huggingface.co/luffycodes/parallel-roberta-large).
44
 
45
  When fine-tuned on downstream tasks, this model achieves the following results:
46
 
 
40
 
41
  *On the left is the standard Series Attention and Feed-Forward Net Design (SAF) for transformers models. On the right is the Parallel Attention and Feed-Forward Net Design (PAF) used in transformer models like PaLM (Chowdhery et al., 2022) and Mesh-Transformers (Wang, 2021)*
42
 
43
+ ## Evaluation results of [PAF-RoBERTa-Large](https://huggingface.co/luffycodes/parallel-roberta-large)
44
 
45
  When fine-tuned on downstream tasks, this model achieves the following results:
46