Gr00t Model - phospho Training Pipeline

Error Traceback

We faced an issue while training your model.

Training process failed with exit code 1:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 277, in apply_rotary_pos_emb
q_embed = (q * cos) + (rotate_half(q) * sin)
^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 252, in rotate_half
return torch.cat((-x2, x1), dim=-1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 79.25 GiB of which 38.75 MiB is free. Process 1618625 has 79.21 GiB memory in use. Of the allocated memory 78.35 GiB is allocated by PyTorch, and 368.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

0%|          | 0/860 [00:09<?, ?it/s]

Training parameters:

Dataset: mlvynhrl/first_dataset
Wandb run URL: None
Epochs: 10
Batch size: 64
Training steps: 853
Train test split: 1.0

📖 Get Started: docs.phospho.ai
🤖 Get your robot: robots.phospho.ai