license: apache-2.0 | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! | |
(model card is repeated due to open llm leaderboard length requirements) | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! | |
mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
it's based on the 3t base model (not chat tuned). | |
not extensively tested. | |
enjoy! |