Spaces:

zero-gpu-explorers
/

README

Running

App Files Files Community

165

RuntimeError: Unexpected error from cudaGetDeviceCount()

#149

by Surn - opened Jan 31

Discussion

Surn

Jan 31

I have tried for about a day to get this to work. I assumed it was improperly imported libraries, or requirements.txt order issues.. but nothing there has fixed it.

Any ideas? the @spaces.GPU location?

Do I need to manually load TensorFlow?

Here is the full error from start up...

===== Application Startup at 2025-01-31 07:19:48 =====

2025-01-31 08:21:13.821347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738308073.839203       1 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738308073.844942       1 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-31 08:21:13.864817: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

preprocessor_config.json:   0%|          | 0.00/285 [00:00<?, ?B/s]
preprocessor_config.json: 100%|██████████| 285/285 [00:00<00:00, 1.44MB/s]

config.json:   0%|          | 0.00/942 [00:00<?, ?B/s]
config.json: 100%|██████████| 942/942 [00:00<00:00, 4.98MB/s]

model.safetensors:   0%|          | 0.00/1.37G [00:00<?, ?B/s]
model.safetensors:  22%|██▏       | 304M/1.37G [00:01<00:03, 303MB/s]
model.safetensors: 100%|█████████▉| 1.37G/1.37G [00:01<00:00, 695MB/s]
/usr/local/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  

@torch
	.library.impl_abstract("xformers_flash::flash_fwd")
/usr/local/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  

@torch
	.library.impl_abstract("xformers_flash::flash_bwd")

themes/[email protected]:   0%|          | 0.00/18.2k [00:00<?, ?B/s]
themes/[email protected]: 100%|██████████| 18.2k/18.2k [00:00<00:00, 61.5MB/s]

ZeroGPU tensors packing: 0.00B [00:00, ?B/s]
ZeroGPU tensors packing: 0.00B [00:00, ?B/s]
* Running on local URL:  http://0.0.0.0:7860, with SSR ⚡ (experimental, to disable set `ssr=False` in `launch()`)

To create a public link, set `share=True` in `launch()`.
input: [[0, 0, 0, 0], [255, 255, 255, 0]]
output: [(0, 0, 0, 0), (255, 255, 255, 0)]
input: [[0, 0, 0, 0], [255, 255, 255, 0]]
output: [(0, 0, 0, 0), (255, 255, 255, 0)]
Local GPU available. Generating image locally.
Generating image with the following parameters:
Model: ostris/Flex.1-alpha
LoRA Weights: ['Cossale/Frames2-Flex.1']
Prompt: FRM$ eight_color (tabletop_map built from small hexagon pieces) as ((empty black on all sides), barren alien_world_map), with light_blue_is_rivers and brown_is_mountains and red_is_volcano and [white_is_snow at the top and bottom of map] as (four_color background: light_blue, green, tan, brown), horizontal_gradient is (brown to tan to green to light_blue to blue) and vertical_gradient is (white to blue to (green, tan and red) to blue to white), (middle is dark, no_reflections, no_shadows), ((partial hexes on edges and sides are black))
Neg Prompt: humans, modern_buildings, vehicles, text, logos, reflections, shadows, realistic map of the Earth, isometric
Height: 512
Width: 384
Number of Inference Steps: 50
Guidance Scale: 3.5
Seed: 44169
Additional Parameters: {'num_inference_steps': 50}
Conditioned Image: None
pipeline: FluxPipeline
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
    torch.init(nvidia_uuid)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 373, in init
    torch.Tensor([0]).cuda()
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 314, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS

Error generating AI image: 'RuntimeError'
Failed to open generated image: 'NoneType' object has no attribute 'read'

Code clip that is relevant to trying to troubleshoot:

os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:256,expandable_segments:True"
IS_SHARED_SPACE = "Surn/*PRIVATE*" in os.environ.get('SPACE_ID', '')

# Set the temporary folder location
#os.environ['TEMP'] = r'e:\\TMP'
#os.environ['TMPDIR'] = r'e:\\TMP'
#os.environ['XDG_CACHE_HOME'] = r'E:\\cache'
os.environ['USE_FLASH_ATTENTION'] = '1'
#os.environ['XFORMERS_FORCE_DISABLE_TRITON']= '1'
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["PYTORCH_NVML_BASED_CUDA_CHECK"] = "1"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

Requirements.txt clip:

accelerate
invisible_watermark
# Updated versions 2.4.0+cu118
torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118/torch-2.4.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=80f75f98282dfcca50a013ce14ee6a4385680e1c15cb0e9b376612442137ead5
torchvision --index-url https://download.pytorch.org/whl/cu118
torchaudio --index-url https://download.pytorch.org/whl/cu118
xformers==0.0.27.post2 --index-url https://download.pytorch.org/whl/cu118/xformers-0.0.27.post2%2Bcu118-cp310-cp310-manylinux2014_x86_64.whl#sha256=b3cdeeb9eae4547805ab8c3c645ac2fa9c6da85b46c039d9befa117e9f6f22fe

# Other dependencies
Haishoku
pybind11>=2.12
huggingface_hub
# git+https://github.com/huggingface/[email protected]#egg=transformers
transformers==4.48.1
gradio[oauth]
Pillow
numpy
requests
# git+https://github.com/huggingface/diffusers
diffusers[torch]

Actual code block clip where it is crashing:

@spaces.GPU(duration=140)
def generate_image_lowmem(
    text,
    neg_prompt=None,
    model_name="black-forest-labs/FLUX.1-dev",
    lora_weights=None,
    conditioned_image=None,
    image_width=1368,
    image_height=848,
    guidance_scale=3.5,
    num_inference_steps=30,
    seed=0,
    true_cfg_scale=1.0,
    pipeline_name="FluxPipeline",
    strength=0.75,
    additional_parameters=None
):
    print(f"\n {get_torch_info()}\n")
    # Retrieve the pipeline class from the mapping
    pipeline_class = PIPELINE_CLASSES.get(pipeline_name)
    if not pipeline_class:
        raise ValueError(f"Unsupported pipeline type '{pipeline_name}'. "
                        f"Available options: {list(PIPELINE_CLASSES.keys())}")
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"device:{device}\nmodel_name:{model_name}\nlora_weights:{lora_weights}\n")

Here is my project footer with all the relevant info:

Surn

Feb 13

Someone knows what this is.. I have reinstalled, installed newer versions of torch, uninstalled.. rearranged code.. I even moved the space to a new location and built it again on a blank template.

Surn

Feb 13

All relevant ZERO Environment variables:

from __future__ import annotations

import os
from pathlib import Path

from .utils import boolean


ZEROGPU_OFFLOAD_DIR_DEFAULT = str(Path.home() / '.zerogpu' / 'tensors')


class Settings:
    def __init__(self):
        self.zero_gpu = boolean(
            os.getenv('SPACES_ZERO_GPU'))
        self.zero_device_api_url = (
            os.getenv('SPACES_ZERO_DEVICE_API_URL'))
        self.gradio_auto_wrap = boolean(
            os.getenv('SPACES_GRADIO_AUTO_WRAP'))
        self.zero_patch_torch_device = boolean(
            os.getenv('ZERO_GPU_PATCH_TORCH_DEVICE'))
        self.zero_gpu_v2 = boolean(
            os.getenv('ZEROGPU_V2'))
        self.zerogpu_offload_dir = (
            os.getenv('ZEROGPU_OFFLOAD_DIR', ZEROGPU_OFFLOAD_DIR_DEFAULT))


Config = Settings()


if Config.zero_gpu:
    assert Config.zero_device_api_url is not None, (
        'SPACES_ZERO_DEVICE_API_URL env must be set '
        'on ZeroGPU Spaces (identified by SPACES_ZERO_GPU=true)'
    )

Surn

Feb 16

Weeks later, I have figured out that everything needs to be in one contiguous space. Typically we modularize and generalize our functions, but here we must write our app.py as if it is a demo for a student with only minor function calls leaving the main logical thread. This creates an extremely long and hard to maintain code base.

I have had to completely recreate my project.

Surn changed discussion status to closed Feb 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment