torch.compile error

#47
by LD-inform - opened

Hi, i have this problem when i am running torch.compile code snippet:

Traceback (most recent call last):
File "/data/gemma_torch/torch_compile.py", line 40, in
outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
...
torch._dynamo.exc.Unsupported: reconstruct: UserDefinedObjectVariable(HybridCache)
from user code:
File "/home/.local/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 1111, in forward
return CausalLMOutputWithPast(

Versions:
torch: 2.3.1
transformers 4.42.4
python: 3.10
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

GPU: NVIDIA L40S

Did anyone encounter this problem? What can i try to get it running?

Google org
edited Nov 4

Hi @LD-inform , It seems, there is issue with the installed torch version 2.3.1 . Could you please try again by upgrading the torch library to the latest version 2.5.1 using !pip install -U torch and let us know if the issue still persists.

Thank you @Renu11 .

i had to add this parameter to torch.compile(..., backend="eager")
without it, i had this error:

/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1465, in _call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Should never be installed

however, the speed increased only by 5t/s (from 44t/s to 49t/s) in 2B model and in 9B model it even decreased from 26t/s to 20t/s.

i used it with different prompt as well but i doubt that is the problem:
input_text = "user\nExplain in details difference between integrals and derivatives.\nmodel"

is there anything else i can try?

Sign up or log in to comment