KeyError: 'embed_tokens.weight'
(VllmWorker rank=5 pid=234) INFO 06-08 00:59:30 [gpu_model_runner.py:1542] Loading drafter model...
(VllmWorker rank=5 pid=234) INFO 06-08 00:59:30 [weight_utils.py:291] Using model weights format ['*.bin']
(VllmWorker rank=7 pid=236) INFO 06-08 00:59:41 [weight_utils.py:307] Time spent downloading weights for yuhuili/EAGLE3-LLaMA3.3-Instruct-70B: 11.785606 seconds
Loading pt checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:06<00:00, 6.74s/it]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:06<00:00, 6.74s/it]
(VllmWorker rank=0 pid=229)
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] WorkerProc failed to start.
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] Traceback (most recent call last):
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 461, in worker_main
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 358, in init
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] self.worker.load_model()
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_worker.py", line 164, in load_model
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] self.model_runner.load_model()
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1543, in load_model
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] self.drafter.load_model(self.model)
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/spec_decode/eagle.py", line 321, in load_model
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] self.model = get_model(vllm_config=self.vllm_config,
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/init.py", line 58, in get_model
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] return loader.load_model(vllm_config=vllm_config,
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 277, in load_model
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] loaded_weights = model.load_weights(
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama_eagle3.py", line 252, in load_weights
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] loaded_weights = loader.load_weights(model_weights.items())
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 277, in load_weights
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] autoloaded_weights = set(self._load_module("", self.module, weights))
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 235, in _load_module
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] yield from self._load_module(prefix,
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/utils.py", line 208, in _load_module
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] loaded_params = module_load_weights(weights)
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama_eagle3.py", line 168, in load_weights
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] param = params_dict[name]
(VllmWorker rank=7 pid=236) ERROR 06-08 00:59:48 [multiproc_executor.py:487] KeyError: 'embed_tokens.weight'
[rank0]:[W608 00:59:50.332732135 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
8TP, yuhuili/EAGLE3-LLaMA3.3-Instruct-70B;
INFO 06-08 01:12:32 [api_server.py:1289] vLLM API server version 0.9.0.1
meta/Llama-3.3-70B-Instruct
vLLM --speculative_config '{"method": "eagle3", "model": "yuhuili/EAGLE3-LLaMA3.3-Instruct-70B", "num_speculative_tokens": 3, "draft_tensor_parallel_size": 8, "max_model_len": 512
The target model can be loaded into GPU and work well if no eagle head. with eagle head, it will have the above error.
solved by pull latest vLLM nightly wheel.