datasets: | |
- c4 | |
language: | |
- en | |
CoreX models are Llama models in which the first X decoder layers are kept, and then the model is finetuned on 1 billion tokens from some dataset. Base model stems from Llama2-7b, medium from Llama2-13b, xl from Llama2-70b. |