textgen webui CUDA memory error on clear cache
#6
by
Yhyu13
- opened
Seems like it's an error on the Mixtral expert choosing, does any one have the same issue? Just want to know if its is a known bug for this model, or maybe a bug for the code?
I am on textgen webui https://github.com/oobabooga/text-generation-webui/commit/d8c3a5bee814f09b0868474002105dcf21a3ff1a
Ubuntu 20.04
RTX3090
Nvidia 545.23.08
Traceback (most recent call last):
File "/home/hangyu5/Documents/Gitrepo-My/text-generation-webui/modules/callbacks.py", line 61, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/Documents/Gitrepo-My/text-generation-webui/modules/text_generation.py", line 376, in generate_with_callback
shared.model.generate(**kwargs)
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/generation/utils.py", line 1764, in generate
return self.sample(
^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/generation/utils.py", line 2861, in sample
outputs = self(
^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 1222, in forward
outputs = self.model(
^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 1090, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 819, in forward
hidden_states, router_logits = self.block_sparse_moe(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyu5/anaconda3/envs/textgen/lib/python3.11/site-packages/transformers/models/mixtral/modeling_mixtral.py", line 736, in forward
idx, top_x = torch.where(expert_mask[expert_idx])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I believe it's a bug in the code.
These kind of errors usually happen (specially on Linux) when you don't have enough vram available.
See this:
https://stackoverflow.com/questions/68106457/pytorch-cuda-error-an-illegal-memory-access-was-encountered
TomGrc
changed discussion status to
closed