YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Highly experimental proper MoE

Based off of smollmv2. (Llama) MoE-ified then further trained on a general dataset.

info:

MoE layers: [8, 12, 16, 20, 24, 28] 
Top-k: 2 (activates 50.0% of experts per token) 
Hidden size: 960 
Total parameters: 494,554,560 
Trainable parameters: 494,554,560 
Auxiliary loss weight: 0.01

code @ https://gist.github.com/cappuch/6a454ec8d2d349a27f9fd84f6ac90554

Downloads last month: 8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including aldigobbler/smollmv2-360Mx4E-MoE-v0.1

MoE Experiments (proper sparse MoEs)

Collection

2 items • Updated 7 days ago