Dataset of lora ?

#2
by chau9ho - opened

Hi Joseph, I'm currently in the process of gathering a dataset for fine-tuning Cantonese LLM models with Qlora. I was wondering if you could shed some light on what kind of dataset you've been using for the "lora" model. Thanks!!

Hi, this model used translated OASST dataset, but the result was not good, we’ve found 2 factors might affected, 1.) Llama2 isn’t fluent in Cantonese https://hon9kon9ize.com/posts/2023-12-18-llm-finetuning1 , 2.) SFT dataset translated from zh contains a lot of mistakes , we are working on an open source Cantonese SFT dataset, will publish soon . Contact me if you interested to learn more [email protected]

Sign up or log in to comment