fevohh
/

RayExtract-0.5B-v1

Model card Files Files and versions

fevohh commited on Mar 22

Commit

1de1ffa

·

verified ·

1 Parent(s): 7007555

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -6,4 +6,8 @@ tags:
 Qwen2.5 0.5B finetuned from Rayman-Extraction-Dataset
 ### Remarks
-Impressive performsnce for its size, however due to the small size, the model is highly specialised and lacks generalisation which in turn requires more dataset diversity. In this case, the model extracts data very well for single-sentence prompts, and is able to formulate the item currency based on the price. However, the model still extracts item_name as "Rayman fist" even when the item isnt even mentioned as is suppoed to be "na", because the dataset doesnt contain sentences that do not contain dirty data i.e. sentences not mentioning "Rayman fist". Model is also incapable of extracting price of "Rayman fist" if a user sentence is buying/selling multiple items with its individual prices, so im going to have to improve the model reasoning and increase the dataset for this sentence type.

 Qwen2.5 0.5B finetuned from Rayman-Extraction-Dataset
 ### Remarks
+Impressive performance for its size, however due to the small size, the model is highly specialised and lacks generalisation which in turn requires more dataset diversity. In this case, the model extracts data very well for single-sentence prompts, and is able to formulate the item currency based on the price. However, the model still extracts item_name as "Rayman fist" even when the item isnt even mentioned as is suppoed to be "na", because the dataset doesnt contain sentences that do not contain dirty data i.e. sentences not mentioning "Rayman fist". Model is also incapable of extracting price of "Rayman fist" if a user sentence is buying/selling multiple items with its individual prices, so im going to have to improve the model reasoning and increase the dataset for this sentence type.
+### Things to note
+This model was trained at 12 epoch (which I thought 6 sufficient but I guess more is better for models < 3B) at 1e-4 learning rate with batch size of 8. One insight I found is that the model (raw safetensors, not quantised), performs very well at 12 epochs albeit having some flaws due to dataset limitation and model capacity, so Im going to stick to this settings in future training