Jaward Sesay's picture

Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

View all activity

Organizations

MLX Community's profile picture

Posts 80

view post
Post
818
Implements from first-principle recently proposed dynamic tanh as alternative to layernorm. Specifically, we trained a nanoGPT (0.8 M params) on tiny shakespeare with conventional layernorm, RMSNorm and dynamic tanh, then compared performances. Observed performance seems to match or is stable for α = 0.5~ 1.5, might outperform if trained longer.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/Dynamic_Tanh.ipynb
Background music by 周子珺

Articles 4

Article
2

In Honour of This Year's NeurIPs Test of Time Paper Awardees

datasets

None public yet