vllm error:operator _C::marlin_qqq_gemm does not exist

#4
by HourseCircle - opened

python3 -m vllm.entrypoints.openai.api_server
--host 0.0.0.0
--port 8000
--enable-auto-tool-choice
--tool-call-parser seed_oss
--trust-remote-code
--model ByteDance-Seed/Seed-OSS-36B-Instruct
--chat-template ./chat_template.jinja
--served-model-name seed_oss
INFO 08-21 02:46:36 [init.py:241] Automatically detected platform cuda.
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/entrypoints/openai/api_server.py", line 43, in
from vllm.engine.async_llm_engine import AsyncLLMEngine # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/engine/async_llm_engine.py", line 18, in
from vllm.engine.llm_engine import LLMEngine
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/engine/llm_engine.py", line 30, in
from vllm.executor.executor_base import ExecutorBase
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/executor/executor_base.py", line 18, in
from vllm.model_executor.layers.sampler import SamplerOutput
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/model_executor/layers/sampler.py", line 16, in
from vllm.model_executor.layers.utils import apply_penalties
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/model_executor/layers/utils.py", line 8, in
from vllm import _custom_ops as ops
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/_custom_ops.py", line 440, in
@register_fake("_C::marlin_qqq_gemm")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/nfs/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/library.py", line 1023, in register
use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
File "/scratch/nfs/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/library.py", line 214, in _register_fake
handle = entry.fake_impl.register(func_to_register, source)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/nfs/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 31, in register
if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator _C::marlin_qqq_gemm does not exist

I made it work by removing the VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL variable:

VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/FoolPlayer/vllm.git@seed-oss
pip install git+https://github.com/Fazziekey/transformers.git@seed-oss

Thanks. I encountered the same problem, and your solution worked.

The official vllm repo has approved our MR. Please use the newest vllm commit, as introduced here.

Sign up or log in to comment