This Works Pretty Well

#1
by attackparent - opened

I've done some experimenting with this model, and I'm quite impressed. I understand how painstakingly tedious it is to manually label a dataset. But I have found a few issues with the model and would like to offer some aid in training v3. I gave this task a shot back in 2023, but failed miserably, however, I have a dataset of my own of roughly 400 labeled images, and I have an rtx 4090 that can train yolov8s models with a batch size of 32.

Hit me up if you are interested in collaborating on v3.

Didn't realize people actually monitored my hugging face posts, so I do apologize for the late response. I've been training/testing other things these days, so I haven't gotten around to doing v3. I've heard suggestions from other people over on Civitai about how they'd like it to naturally detect pairs of feet--which could be done, but it would mean I'd need to re-label the entire dataset. I don't mind collaborations, especially if you got a more powerful machine than I do XD.
What did you have in mind? I don't mind manually labelling things again, it'll just take an afternoon or two. I've been training on yolov8x models for these two versions, so I'm not sure how that will affect things on your end if you end up training it.
For the most part, I had been following sp00ns' guide for the training process.

Unlike SDXL, I am able to train yolo models on my PC fairly quickly (a day, as opposed to 1-3 weeks for SDXL, lol)

Thanks for reaponding! Don't worry about missing the message.
Link me to the civil ai convo if you can. There are some clever people in the training community there.

The main problem I'm seeing with detection on the model is that in some orientations it will detect 2 feet where there is 1. ADetailer has tools to limit the inpainting to the n best matches, but this isn't the best solution. I think that your dataset probably needs expanded. I'll paste an example later.

As for training sdxl, there's some good news on that front. Kohya supports fp8 training now. With 24 GB I can do a batch size of 5 at 1024x1024 which reduces training time to about 15 minutes for a character. It's amazing.

You should be able to train without memory overflow at a batch size of 1 now.

By convo, do you mean regarding the suggestions I've received on Civitai? (Also, sorry once again for the late response--I really rarely monitor my hugging face stuff lol; I still have yet to post my latest LoRAs/checkpoints here)
https://civitai.com/models/257904/adetailer-footyolov8xpt
It would be found in the link above, but it's brief.
One of the suggestions I'd received was to have it be able to detect 2 feet at a time, so that footwear can match up. My response to that was somewhere around the lines that I'd already purposely trained it to detect 2 feet at the same time, but only when they were very close together. There are some pros and cons to detecting pairs of feet, as opposed to individually.
And then there's this new World model, which has caught my attention. If I were to make a version 3, I think I'd want it to be able to detect pairs of feet as well as individual feet, depending on the input parameter.
On top of all of that, I'd want version 3 to have a segment-based mask, instead of the box masks that versions 1 and 2 have, since with some checkpoints, there's a obvious discolouration/residue left over from where the mask was, thus with segmentation it would mitigate that effect. Segment masking would make perfect sense with feet as generally the overall shape stays the same, unless it's like a closeup shot with toes spread apart or something.

Haven't heard about this fp8 training stuff before... I should check it out. I only have 8GB VRAM at my disposal, so anything that speeds up SDXL LoRA/checkpoint training would be nice. I'm currently testing out OneTrainer to train a hefty model with a dataset of 1000+ images. Since I only got 8GB, I've limited my batch size to 1, and since I want it to be high quality, I left the resolution to 1024x1024. It's going to take a literal week to finish training. But, to be fair, this is a vast improvement over what I tried doing with Koyha with the same dataset; it would have taken me a month or two. My eyes are always on the lookout for new ways to reduce the VRAM consumption for training SDXL models.

Sign up or log in to comment