Performance oddities of the 3M model.
#3 opened 10 months ago
by
MartialTerran
GPT-2 model having16 4-float attention heads
#2 opened 10 months ago
by
MartialTerran
Adding `safetensors` variant of this model
#1 opened over 1 year ago
by
SFconvertbot
