Spaces:
Running
[ERROR] Unexpected error from cudaGetDeviceCount() - zeroGPU
I've been trying to setup a space with ZeroGPU, here's my space and code: https://huggingface.co/spaces/pinyuchen/Diveye_AI_text_detector/tree/main
(app.py is all my code atm, used to be modularized)
Here's the recurrent error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
torch.init(nvidia_uuid)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 373, in init
torch.Tensor([0]).cuda()
File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 314, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS
A similar discussion with no answer but "unmodularize" everything in a single app.py doesn't seem to work either.
Any sort of help would be really appreciated!
#import xgboost as xgb
With this and a few more changes, I was able to avoid the error itself, but commenting this out is not a good idea...๐
There are other libraries that also reference CUDA when importing, which ultimately causes a crash. Quanto, for example.
I see.
Do you suggest shifting the model loading (xgb) and the import of the library into the @spaces.GPU function? Is that a way to fix things?
Do you suggest shifting the model loading (xgb) and the import of the library into the @spaces.GPU function? Is that a way to fix things?
I hadn't thought of that. There seems to be some overhead with import
, but that method might indeed avoid this error...
It would be smarter if we had a library structure itself that could be offloaded to the CPU, but that would probably require forking GitHub and customizing it for our use...
I think I solved this one when the bert-beatrix thing was acting up. You can probably find the fix in the source code.
https://huggingface.co/spaces/AbstractPhil/bert-beatrix-2048-testing/blob/main/app.py
Seems to work now @John6666 @AbstractPhil . Thanks for pointing the problem out! <3
The xgb code was moved into the @spaces.GPU and the error is gone now; thanks again!