Gr00t Model - phospho Training Pipeline
Error Traceback
We faced an issue while training your model.
Training process failed with exit code 1:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/transformers/activations.py", line 46, in forward
return nn.functional.gelu(input, approximate="tanh")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 102.00 MiB. GPU 0 has a total capacity of 79.15 GiB of which 98.12 MiB is free. Process 3696851 has 61.80 GiB memory in use. Process 4109944 has 17.24 GiB memory in use. Of the allocated memory 16.40 GiB is allocated by PyTorch, and 350.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
0%| | 0/4402 [00:08<?, ?it/s]
Training parameters:
- Dataset: advpatel/foldshirt
- Wandb run URL: None
- Epochs: 1
- Batch size: 16
- Training steps: 4402
More:
π Get Started: docs.phospho.ai
π€ Get your robot: robots.phospho.ai
π Explore on Replicate: Replicate
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support