File size: 603 Bytes
17b0f6d
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
**⚠Warning⚠ this is an experimental weight. It may not have practical performance.**<br>
**Also, the model file must be manually rewritten or replaced to use this weight.**<br>

The model file is available here.<br>
https://github.com/lucidrains/BS-RoFormer

The BS-Roformer has been updated in terms of architecture for the first time in a while.<br>
In the 0.5.x update, a mechanism called "Value Residual Learning" was introduced. (https://arxiv.org/abs/2410.17897)<br>
The paper argues that this mechanism can reduce the over-focus of attention and further reduce the vanishing gradient problem.