This model is "Built with Llama".

It is based on meta-llama/Meta-Llama-3.1-8B-Instruct and was created with the help of mergekit. This is the mergekit configuration we used: mergekit_moe_config.yml

It should be noted that this model is the raw model after merging. It still has randomly initialized router networks and will not be better than a single one of its expert models. This model requires further training before use.

This model has a total of 47.5B params, which is slightly more than the Mixtral 8x7b with its 46.7B params.

Licensing

This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 Philip May, Deutsche Telekom AG
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

Downloads last month
42
Safetensors
Model size
47.5B params
Tensor type
BF16
ยท
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw

Finetuned
(586)
this model