Content
Qwen2.5 0.5B finetuned from Rayman-Extraction-Dataset
Remarks
Impressive performance for its size, however due to the small size, the model is highly specialised and lacks generalisation which in turn requires more dataset diversity. In this case, the model extracts data very well for single-sentence prompts, and is able to formulate the item currency based on the price. However, the model still extracts item_name as "Rayman fist" even when the item isnt even mentioned as is suppoed to be "na", because the dataset doesnt contain sentences that do not contain dirty data i.e. sentences not mentioning "Rayman fist". Model is also incapable of extracting price of "Rayman fist" if a user sentence is buying/selling multiple items with its individual prices, so im going to have to improve the model reasoning and increase the dataset for this sentence type.
Things to note
This model was trained at 12 epoch (which I thought 6 was sufficient but I guess more is better for models < 3B) at 1e-4 learning rate with batch size of 8. One insight I found is that the model (raw safetensors, not quantised) performs very well at 12 epochs albeit having some flaws due to dataset limitation and model capacity leading to saturated quality and output, so Im going to stick to this settings in future training
- Downloads last month
- 12
4-bit
5-bit
8-bit
16-bit