Help: Trying to load on 2x 6000 pro 96gb

#3
by Fernanda24 - opened

I have CUDA 12.9 and tried with vllm versions 9.2 though 10.1 but still getting errors with 2x 6000 pro 96gb blackwell.

(VllmWorker rank=1 pid=3838213) INFO 08-02 16:59:18 [backends.py:194] Cache the graph for dynamic shape for later use
(VllmWorker rank=0 pid=3838212) INFO 08-02 16:59:18 [backends.py:194] Cache the graph for dynamic shape for later use
(VllmWorker rank=1 pid=3838213) INFO 08-02 16:59:35 [backends.py:215] Compiling a graph for dynamic shape takes 17.38 s
(VllmWorker rank=0 pid=3838212) INFO 08-02 16:59:35 [backends.py:215] Compiling a graph for dynamic shape takes 17.44 s
(VllmWorker rank=0 pid=3838212) WARNING 08-02 16:59:45 [fused_moe.py:695] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/giga/vllm/vllm/model_executor/layers/fused_moe/configs/E=128,N=16384,device_name=NVIDIA_RTX_PRO_6000_Blackwell_Workstation_Edition.json
(VllmWorker rank=1 pid=3838213) WARNING 08-02 16:59:45 [fused_moe.py:695] Using default MoE config. Performance might be sub-optimal! Config file not found at /home/giga/vllm/vllm/model_executor/layers/fused_moe/configs/E=128,N=16384,device_name=NVIDIA_RTX_PRO_6000_Blackwell_Workstation_Edition.json
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618] WorkerProc hit an exception.
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618] Traceback (most recent call last):
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/vllm/vllm/v1/executor/multiproc_executor.py", line 613, in worker_busy_loop
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/vllm/vllm/v1/worker/gpu_worker.py", line 222, in determine_available_memory
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     self.model_runner.profile_run()
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/vllm/vllm/v1/worker/gpu_model_runner.py", line 2229, in profile_run
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     = self._dummy_run(self.max_num_tokens, is_profile=True)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/vllm/vllm/v1/worker/gpu_model_runner.py", line 2010, in _dummy_run
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     outputs = model(
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]               ^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/vllm/vllm/model_executor/models/qwen3_moe.py", line 527, in forward
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/vllm/vllm/compilation/decorators.py", line 239, in __call__
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     output = self.compiled_callable(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 763, in compile_wrapper
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return fn(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/vllm/vllm/model_executor/models/qwen3_moe.py", line 350, in forward
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     def forward(
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 395, in __call__
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return super().__call__(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 962, in _fn
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return fn(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 848, in call_wrapped
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 424, in __call__
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     raise e
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 411, in __call__
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "/home/giga/.pyenv/versions/3.12.11/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]   File "<eval_with_key>.190", line 1615, in forward
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]     submod_2 = self.submod_2(getitem_3, s72, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_qweight_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_scales_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_qzeros_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_g_idx_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_g_idx_sort_indices_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_quant_method_kernel_workspace, getitem_4, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_parameters_weight_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_qweight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_scales_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_qzeros_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_g_idx_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_g_idx_sort_indices_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_quant_method_kernel_workspace, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_qweight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_scales_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_qzeros_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_g_idx_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_g_idx_sort_indices_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_quant_method_kernel_workspace = getitem_4 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_qweight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_scales_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_qzeros_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_g_idx_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_g_idx_sort_indices_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_quant_method_kernel_workspace = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(VllmWorker rank=0 pid=3838212) ERROR 08-02 16:59:46 [multiproc_executor.py:618]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .... ```

Sign up or log in to comment