deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw

This model is "Built with Llama".

It is based on meta-llama/Meta-Llama-3.1-8B-Instruct and was created with the help of mergekit. This is the mergekit configuration we used: mergekit_moe_config.yml

It should be noted that this model is the raw model after merging. It still has randomly initialized router networks and will not be better than a single one of its expert models. This model requires further training before use.

This model has a total of 47.5B params, which is slightly more than the Mixtral 8x7b with its 46.7B params.

Licensing

This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 Philip May, Deutsche Telekom AG
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

deutsche-telekom
/

Llama-3.1-MoE-8x8B-Instruct-raw

Licensing

Model tree for deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw