Gr00t Model - phospho Training Pipeline

Error Traceback

We faced an issue while training your model.

Training process failed with exit code 1:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 277, in apply_rotary_pos_emb
q_embed = (q * cos) + (rotate_half(q) * sin)
^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 252, in rotate_half
return torch.cat((-x2, x1), dim=-1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 79.25 GiB of which 38.75 MiB is free. Process 1618625 has 79.21 GiB memory in use. Of the allocated memory 78.35 GiB is allocated by PyTorch, and 368.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

0%|          | 0/860 [00:09<?, ?it/s]

Training parameters:

  • Dataset: mlvynhrl/first_dataset
  • Wandb run URL: None
  • Epochs: 10
  • Batch size: 64
  • Training steps: 853
  • Train test split: 1.0

More:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support