BVRA/FungiCLEF2024 · 🐞🐜🐛My Submission has Failed [REPORT HERE]

picekl

Bohemian Visual Recognition Alliance org Mar 13, 2024

Before writing to this thread! Ask yourself the following questions and write just if you answer YES to all of them.

Did my code run in a local environment?
Did I use python 3.10?
Did you push your code to HugginFace as a model?
Can your model process 10,000 images in 60 minutes in a similar environment like t4-small?
Did you use any non-standard library? [if yes, go to a different thread and request it]

❗Please provide your submission ID while asking for an Error Log ❗

picekl pinned discussion May 1, 2024

jack-etheredge

May 10, 2024

This comment has been hidden

jack-etheredge

May 12, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 13, 2024

This comment has been hidden

jack-etheredge

May 13, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 13, 2024

This comment has been hidden

abhishek

May 14, 2024

This comment has been hidden

jack-etheredge

May 14, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 14, 2024

This comment has been hidden

jack-etheredge

May 14, 2024

This comment has been hidden

jack-etheredge

May 14, 2024

This comment has been hidden

stefanwolf

May 14, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 14, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 14, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 14, 2024

This comment has been hidden

stefanwolf

May 15, 2024

This comment has been hidden

tasdfgy

May 19, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 19, 2024

This comment has been hidden

tasdfgy

May 19, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 19, 2024

This comment has been hidden

tasdfgy

May 19, 2024

This comment has been hidden

chychiu

May 22, 2024

This comment has been hidden

chychiu

May 22, 2024

This comment has been hidden

tasdfgy

May 22, 2024

Hi @picekl

I submitted a new model earlier today; however, it appears to have been stuck in the processing stage for over two hours without any indication of failure. Could you please look into this issue for me?
The Submission ID is d2d2dc8f-f713-483d-942a-a379c02784f6

Thank you!

picekl

Bohemian Visual Recognition Alliance org May 22, 2024

Hi @tasdfgy ,

I had to stop it manually. Can you please try to resubmit?

Best,
Lukas

jack-etheredge

May 23, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 23, 2024

This comment has been hidden

jack-etheredge

May 23, 2024

This comment has been hidden

tasdfgy

May 23, 2024

This comment has been hidden

chychiu

May 23, 2024

This comment has been hidden

jack-etheredge

May 23, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 23, 2024

This comment has been hidden

jack-etheredge

May 23, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 23, 2024

This comment has been hidden

jack-etheredge

May 23, 2024

This comment has been hidden

chychiu

May 24, 2024

This comment has been hidden

chychiu

May 24, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 24, 2024

This comment has been hidden

chychiu

May 24, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 24, 2024

This comment has been hidden

chychiu

May 24, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 24, 2024

This comment has been hidden

chychiu

May 24, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 24, 2024

This comment has been hidden

chychiu

May 24, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 24, 2024

This comment has been hidden

chychiu

May 24, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 24, 2024

This comment has been hidden

chychiu

May 24, 2024

This comment has been hidden

picekl

Bohemian Visual Recognition Alliance org May 24, 2024

@chychiu If you are interested in submitting the Working note, we will most likely reopen the submission on Monday to allow post-competition submissions for the ablation evaluation, etc. Those submissions won't change the competition ranking, but it should still allow you to submit a "better" run :]

Sad you started so late with the submissions :/

LP

chychiu

May 24, 2024

Thanks for letting me know. I have submitted one last try: 0a748de1-3ffb-44ea-9946-0836a13ee180

If it fails, I give up. Please let me know the issue still, I will check tomorrow morning. It's 5:50am right now and I am so over it

Shame the HF platform has so many issues (I can't even do multiprocessing with a DataLoader???). This is a week of my life I will never get back. Learnt a lot though (although hacky), so no regrets :)

picekl

Bohemian Visual Recognition Alliance org May 24, 2024

@chychiu It looks like it runs smoothly but slowly. I'm afraid you'll reach the time limit. How about you use a bigger batch size? There is 16 GB of vRAM, so you could go a little bit higher.

LP

chychiu

May 25, 2024

I reduced the batch size because I wasn't sure what was failing. This is genuinely disheartening that huggingface is so terrible and that the competition had so many issues as a result. It was a fun project that was well motivated but unfortunately did not come to fruition. Let me know when you open the late submission. I hope in the future another competition platform can be considered instead.

jack-etheredge

Jun 1, 2024

Can I get the stack trace for a45d4a7e-be41-475f-917c-a3612b574669? Apologies.

picekl

Bohemian Visual Recognition Alliance org Jun 1, 2024

Can I get the stack trace for a45d4a7e-be41-475f-917c-a3612b574669? Apologies.

   raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

We do not have the budget for the GPU inference, so we had to switch it to CPU only.

LP

jack-etheredge

Jun 1, 2024

Very strange. I'll have to do some digging. I explicitly set the device to "cpu" since I read your note elsewhere about the CPU restriction.

picekl

Bohemian Visual Recognition Alliance org Jun 1, 2024

@jack-etheredge ,

Try this one:

my_model = net.load_state_dict(torch.load('classifier.pt', map_location=torch.device('cpu')))

LP

jack-etheredge

Jun 1, 2024

@picekl Looks like that probably did it. I guess in the future I need to make sure my device setting makes its way to setting the map_location param.

chychiu

Jun 1, 2024

If I have a dollar for everytime huggingfail dies on me I would have $20 by now. Can you check why this run failed? @picekl

74758e89-d7aa-46ba-a8ea-030a318eb03f

I made sure everything is on CPU

picekl

Bohemian Visual Recognition Alliance org Jun 2, 2024

•

edited Jun 2, 2024

If I have a dollar for everytime huggingfail dies on me I would have $20 by now. Can you check why this run failed? @picekl

74758e89-d7aa-46ba-a8ea-030a318eb03f

I made sure everything is on CPU

Hi @chychiu , and welcome back! Looks like this is still the same problem as previously.

  0%|          | 1/XXXXX. [00:49<182:27:29, 49.00s/it]2024-06-01 21:58:27.565 | ERROR    | __main__:generate_submission_file:72 - Subprocess didn't terminate successfully

Can you contact me via email? I might have a solution, but I do not want to bother the rest of the people.

Best,
Lukas