Dataset of lora ?

by chau9ho - opened Jan 19

Jan 19

Hi Joseph, I'm currently in the process of gathering a dataset for fine-tuning Cantonese LLM models with Qlora. I was wondering if you could shed some light on what kind of dataset you've been using for the "lora" model. Thanks!!

indiejoseph

Owner Jan 19

Hi, this model used translated OASST dataset, but the result was not good, we’ve found 2 factors might affected, 1.) Llama2 isn’t fluent in Cantonese https://hon9kon9ize.com/posts/2023-12-18-llm-finetuning1 , 2.) SFT dataset translated from zh contains a lot of mistakes , we are working on an open source Cantonese SFT dataset, will publish soon . Contact me if you interested to learn more [email protected]

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment