Korean data rate in pretraining datasets.

#78
by Korabbit - opened

When I looked up the paper , there was no report on the percentage of Korean data.
What is the percentage of Korean data?

i have same question as you, they said that they outperforms Llama 2 13B on all benchmarks, but their model seem not support korean or vietnamese language

@Korabbit Did you find answer to this?

@RoiandDae No, I can't find this answer.

Sign up or log in to comment