the gradient_checkpointing option is always False in BigBirdPegasusEncoder, BigBirdPegasusDecoder class
#9
by
hseom
- opened
Thanks for this great work!
I found this: in BigBirdPegasusEncoder, BigBirdPegasusDecoder class, the gradient_checkpointing option is always False so the GPU memory is accumlated.
please make it optional again :)
# modling_bigbird_pegasus.py, line 1768~
# BigBridPegasusEncoder class
...
...
self.layers = nn.ModuleList([BigBirdPegasusEncoderLayer(config, seed=i) for i in range(config.encoder_layers)])
self.layernorm_embedding = nn.LayerNorm(embed_dim)
self.gradient_checkpointing = False