[ERROR] Unexpected error from cudaGetDeviceCount() - zeroGPU

#168
by FloofCat - opened

I've been trying to setup a space with ZeroGPU, here's my space and code: https://huggingface.co/spaces/pinyuchen/Diveye_AI_text_detector/tree/main
(app.py is all my code atm, used to be modularized)

Here's the recurrent error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
    torch.init(nvidia_uuid)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 373, in init
    torch.Tensor([0]).cuda()
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 314, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS

A similar discussion with no answer but "unmodularize" everything in a single app.py doesn't seem to work either.

Any sort of help would be really appreciated!

FloofCat changed discussion title from [ERROR] Unable to run my app.py: to [ERROR] Unexpected error from cudaGetDeviceCount() - zeroGPU
#import xgboost as xgb

With this and a few more changes, I was able to avoid the error itself, but commenting this out is not a good idea...๐Ÿ˜…
There are other libraries that also reference CUDA when importing, which ultimately causes a crash. Quanto, for example.

I see.

Do you suggest shifting the model loading (xgb) and the import of the library into the @spaces.GPU function? Is that a way to fix things?

Do you suggest shifting the model loading (xgb) and the import of the library into the @spaces.GPU function? Is that a way to fix things?

I hadn't thought of that. There seems to be some overhead with import, but that method might indeed avoid this error...
It would be smarter if we had a library structure itself that could be offloaded to the CPU, but that would probably require forking GitHub and customizing it for our use...

I think I solved this one when the bert-beatrix thing was acting up. You can probably find the fix in the source code.
https://huggingface.co/spaces/AbstractPhil/bert-beatrix-2048-testing/blob/main/app.py

Seems to work now @John6666 @AbstractPhil . Thanks for pointing the problem out! <3

The xgb code was moved into the @spaces.GPU and the error is gone now; thanks again!

FloofCat changed discussion status to closed

Sign up or log in to comment