Edit model card

An untrained precursor MoE created from Cosmo using mergekit.

Gate routing initialized using prompt hidden state method. Five are based on the visualized topic clusters of Cosmopedia data, three are task-oriented.

Degenerate layers were 0, 1, and 2. Expert gates for layers 0, 1, and 2 have been randomly initialized to with luck mitigate this.

Downloads last month
17
Safetensors
Model size
10.2B params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for Lambent/cosmoem-8x1B

Adapters
1 model

Dataset used to train Lambent/cosmoem-8x1B