aldigobbler's picture
Update README.md
2825115 verified

!! "moe" - routed inference between 3 different models without any tying

experimental MoE with 3 experts totalling 480m~ params router is roughly 70M params

no loss chart for this router trained on 15 samples