Post
818
Implements from first-principle recently proposed dynamic tanh as alternative to layernorm. Specifically, we trained a nanoGPT (0.8 M params) on tiny shakespeare with conventional layernorm, RMSNorm and dynamic tanh, then compared performances. Observed performance seems to match or is stable for α = 0.5~ 1.5, might outperform if trained longer.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/Dynamic_Tanh.ipynb
Background music by 周子珺
Code: https://github.com/Jaykef/ai-algorithms/blob/main/Dynamic_Tanh.ipynb
Background music by 周子珺