I think it is from Tensor Programs V paper.Just curious how to train it: Deepspeed with custom optimizer will raise exception.
· Sign up or log in to comment