Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Add memory calculation for ZeRO stages
#14
by
deleted
- opened
Since training large models consumes large memory with adam optimizer. Some may train them with a frame work like deepspeed, which implemeted ZeRO algorithms (ZeRO: Memory Optimizations Toward Training Trillion Parameter Models) to save memory. It would be really appreciated if you provide the memory usage for different ZeRO stages since experimenting for this costs a lot.
We are looking into the possibility of doing this :)