Gr00t Model - phospho Training Pipeline

Error Traceback

We faced an issue while training your model.

Training process failed with exit code 1:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/transformers/activations.py", line 46, in forward
return nn.functional.gelu(input, approximate="tanh")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 102.00 MiB. GPU 0 has a total capacity of 79.15 GiB of which 98.12 MiB is free. Process 3696851 has 61.80 GiB memory in use. Process 4109944 has 17.24 GiB memory in use. Of the allocated memory 16.40 GiB is allocated by PyTorch, and 350.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

0%|          | 0/4402 [00:08<?, ?it/s]

Training parameters:

  • Dataset: advpatel/foldshirt
  • Wandb run URL: None
  • Epochs: 1
  • Batch size: 16
  • Training steps: 4402

More:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support