- Code release: https://github.com/microsoft/torchscale - March 2022: release preprint [DeepNet: Scaling Transformers to 1,000 Layers](https://arxiv.org/abs/2203.00555) ``` @article{deepnet, author = {Hongyu Wang and Shuming Ma and Li Dong and Shaohan Huang and Dongdong Zhang and Furu Wei}, title = {{DeepNet}: Scaling {Transformers} to 1,000 Layers}, journal = {CoRR}, volume = {abs/2203.00555}, year = {2022}, } ```