Validation data

#2
by SmilingWolf - opened

Hi! Sorry to bother you, but would it be possible to share at least the validation dataset?

All the data used in the V3 taggers is available here for inspection: https://huggingface.co/datasets/SmilingWolf/wdtagger-v3-seed

Can do. Is it for a comparison against yours?

I think that the split is a little small, orginally I was basing it off of JoyTag's 32,768 but I should've considered the number of tags that wouldn't end up in it.

Testing it I found these stats:

Analyzed 20116 samples in the split
Found 31169 unique tags (out of 70527 possible tags)

Tag distribution by category:
general: 14387 tags (46.6% of all general tags)
character: 9803 tags (36.4% of all character tags)
artist: 4671 tags (66.7% of all artist tags)
copyright: 2081 tags (38.8% of all copyright tags)
meta: 203 tags (62.8% of all meta tags)
year: 20 tags (100.0% of all year tags)
rating: 4 tags (100.0% of all rating tags)

Which was better than expected but still of course not the majority.

Can do. Is it for a comparison against yours?

Busted :D

Yeah I was mostly worried about the amount of true positives in the test set.
With my filtering and 300k samples in the val split, I still end up with a minimum of 15 samples for a tag which is defo not great.
I hypothesised that with lower requirements for the amount of samples per tag you may have been "saving" more images on the tags-per-image front, in turn getting a few more tags in, but it still felt somewhat low.

As a comparison, the val split in my dataset has got 8106 general tags, and a min amount of samples per tag of 15.

Here are the stats for the test set:

=== Tag Statistics Per Sample ===
Total samples analyzed: 20116
Tags per sample:
Minimum: 29
Maximum: 628
Mean: 45.64
Median: 43.00
Standard deviation: 13.30

Percentiles (tags per sample):
10th percentile: 34.0
25th percentile: 37.0
75th percentile: 51.0
90th percentile: 61.0
95th percentile: 68.0
99th percentile: 91.0

Sample distribution:
Samples with 0 tags: 0 (0.00%)
Samples with 1-5 tags: 0 (0.00%)
Samples with 6-10 tags: 0 (0.00%)
Samples with 11-20 tags: 0 (0.00%)
Samples with 21-50 tags: 14935 (74.24%)
Samples with 51+ tags: 5181 (25.76%)

Would you still like me to upload the test dataset for the performance comparison or to verify any of this? Note that my dataset doesn't include the images just the path as I resized on the fly.

Would you still like me to upload the test dataset for the performance comparison or to verify any of this?

Yeah if you don't mind. Getting the images is not a problem, I'll handle that myself, as long as the danbooru ID is available somewhere.

It's in training/val_dataset.csv. Note that the rating and year tags are in a different format.

Thank you!
I'll work on it in the weekend and report here the results. Code will be uploaded somewhere public (likely github).

🫡

Dumped it all here: https://huggingface.co/datasets/SmilingWolf/camie-tagger-vs-wd-tagger-val
Decided to use huggingface in case it becomes necessary to upload the probs dumps or other huge binary data.

Thanks I'll add it to the comparison section.

It seems that your model is considerably more accurate on common general tags. I think I definitely set too low of a threshold for the min number of samples. Orginally my thinking was more=better because of the hierarchical relationship between the categories (general helps predict, character which helps predict copyright etc).

It's interesting to note that for the 14387 general tags in the validation set my macro is similar/higher than the ignoring top 5000 from your results (0.204 vs 0.196). I'm not quite sure why this might be the case as you would assume the model would generally get worse as the tags get rarer.

I'm quite happy with how the character tags turned out but I'm going to keep training as there should be some improvements to add.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment