TinyStories (GPT-Neo)
Model set for Linguistic Collapse: Neural Collapse in (Large) Language Models
(NeurIPS 2024, arXiv:2405.17767),
leveraging neural-collapse
,
GPT-Neo
and TinyStories.
Code found at https://github.com/rhubarbwu/linguistic-collapse/.
Past Model Family Paths
In running experiments for the paper, models were named on an ephemeral bases
in the following format LLxdddd_EEb
, where
LL
is the number of layers;
dddd
is the hidden dimension;
EE
is the number of epochs;
and b
is the level of regularization.
Most of these models no longer exist on Huggingface.
Models trained with no weight decay (`b=0`).
rhubarbwu/TinyStories-01x0064_01n rhubarbwu/TinyStories-04x0512_01n
rhubarbwu/TinyStories-01x0064_10n rhubarbwu/TinyStories-04x0512_10n
rhubarbwu/TinyStories-01x0128_01n rhubarbwu/TinyStories-04x0768_01n
rhubarbwu/TinyStories-01x0128_10n rhubarbwu/TinyStories-04x0768_10n
rhubarbwu/TinyStories-01x0256_01n rhubarbwu/TinyStories-04x1024_01n
rhubarbwu/TinyStories-01x0256_10n rhubarbwu/TinyStories-04x1024_10n
rhubarbwu/TinyStories-01x0512_01n rhubarbwu/TinyStories-08x0064_01n
rhubarbwu/TinyStories-01x0512_10n rhubarbwu/TinyStories-08x0064_10n
rhubarbwu/TinyStories-01x0768_01n rhubarbwu/TinyStories-08x0128_01n
rhubarbwu/TinyStories-01x0768_10n rhubarbwu/TinyStories-08x0128_10n
rhubarbwu/TinyStories-01x1024_01n rhubarbwu/TinyStories-08x0256_01n
rhubarbwu/TinyStories-01x1024_10n rhubarbwu/TinyStories-08x0256_10n
rhubarbwu/TinyStories-02x0064_01n rhubarbwu/TinyStories-08x0512_01n
rhubarbwu/TinyStories-02x0064_10n rhubarbwu/TinyStories-08x0512_10n
rhubarbwu/TinyStories-02x0128_01n rhubarbwu/TinyStories-08x0768_01n
rhubarbwu/TinyStories-02x0128_10n rhubarbwu/TinyStories-08x0768_10n
rhubarbwu/TinyStories-02x0256_01n rhubarbwu/TinyStories-08x1024_01n
rhubarbwu/TinyStories-02x0256_10n rhubarbwu/TinyStories-08x1024_10n
rhubarbwu/TinyStories-02x0512_01n rhubarbwu/TinyStories-12x0064_01n
rhubarbwu/TinyStories-02x0512_10n rhubarbwu/TinyStories-12x0064_10n
rhubarbwu/TinyStories-02x0768_01n rhubarbwu/TinyStories-12x0128_01n
rhubarbwu/TinyStories-02x0768_10n rhubarbwu/TinyStories-12x0128_10n
rhubarbwu/TinyStories-02x1024_01n rhubarbwu/TinyStories-12x0256_01n
rhubarbwu/TinyStories-02x1024_10n rhubarbwu/TinyStories-12x0256_10n
rhubarbwu/TinyStories-04x0064_01n rhubarbwu/TinyStories-12x0512_01n
rhubarbwu/TinyStories-04x0064_10n rhubarbwu/TinyStories-12x0512_10n
rhubarbwu/TinyStories-04x0128_01n rhubarbwu/TinyStories-12x0768_01n
rhubarbwu/TinyStories-04x0128_10n rhubarbwu/TinyStories-12x0768_10n
rhubarbwu/TinyStories-04x0256_01n rhubarbwu/TinyStories-12x1024_01n
rhubarbwu/TinyStories-04x0256_10n rhubarbwu/TinyStories-12x1024_10n
Models trained minimal weight decay (`b=0.0005`).
rhubarbwu/TinyStories-01x0064_01d rhubarbwu/TinyStories-04x0512_01d
rhubarbwu/TinyStories-01x0064_10d rhubarbwu/TinyStories-04x0512_10d
rhubarbwu/TinyStories-01x0128_01d rhubarbwu/TinyStories-04x0768_01d
rhubarbwu/TinyStories-01x0128_10d rhubarbwu/TinyStories-04x0768_10d
rhubarbwu/TinyStories-01x0256_01d rhubarbwu/TinyStories-04x1024_01d
rhubarbwu/TinyStories-01x0256_10d rhubarbwu/TinyStories-04x1024_10d
rhubarbwu/TinyStories-01x0512_01d rhubarbwu/TinyStories-08x0064_01d
rhubarbwu/TinyStories-01x0512_10d rhubarbwu/TinyStories-08x0064_10d
rhubarbwu/TinyStories-01x0768_01d rhubarbwu/TinyStories-08x0128_01d
rhubarbwu/TinyStories-01x0768_10d rhubarbwu/TinyStories-08x0128_10d
rhubarbwu/TinyStories-01x1024_01d rhubarbwu/TinyStories-08x0256_01d
rhubarbwu/TinyStories-01x1024_10d rhubarbwu/TinyStories-08x0256_10d
rhubarbwu/TinyStories-02x0064_01d rhubarbwu/TinyStories-08x0512_01d
rhubarbwu/TinyStories-02x0064_10d rhubarbwu/TinyStories-08x0512_10d
rhubarbwu/TinyStories-02x0128_01d rhubarbwu/TinyStories-08x0768_01d
rhubarbwu/TinyStories-02x0128_10d rhubarbwu/TinyStories-08x0768_10d
rhubarbwu/TinyStories-02x0256_01d rhubarbwu/TinyStories-08x1024_01d
rhubarbwu/TinyStories-02x0256_10d rhubarbwu/TinyStories-08x1024_10d
rhubarbwu/TinyStories-02x0512_01d rhubarbwu/TinyStories-12x0064_01d
rhubarbwu/TinyStories-02x0512_10d rhubarbwu/TinyStories-12x0064_10d
rhubarbwu/TinyStories-02x0768_01d rhubarbwu/TinyStories-12x0128_01d
rhubarbwu/TinyStories-02x0768_10d rhubarbwu/TinyStories-12x0128_10d
rhubarbwu/TinyStories-02x1024_01d rhubarbwu/TinyStories-12x0256_01d
rhubarbwu/TinyStories-02x1024_10d rhubarbwu/TinyStories-12x0256_10d
rhubarbwu/TinyStories-04x0064_01d rhubarbwu/TinyStories-12x0512_01d
rhubarbwu/TinyStories-04x0064_10d rhubarbwu/TinyStories-12x0512_10d
rhubarbwu/TinyStories-04x0128_01d rhubarbwu/TinyStories-12x0768_01d
rhubarbwu/TinyStories-04x0128_10d rhubarbwu/TinyStories-12x0768_10d
rhubarbwu/TinyStories-04x0256_01d rhubarbwu/TinyStories-12x1024_01d
rhubarbwu/TinyStories-04x0256_10d rhubarbwu/TinyStories-12x1024_10d
// duplicates of TinyStories-02x0768_01d for permutation test
rhubarbwu/TinyStories-02x0768_01d00 rhubarbwu/TinyStories-02x0768_01d10
rhubarbwu/TinyStories-02x0768_01d01 rhubarbwu/TinyStories-02x0768_01d11
rhubarbwu/TinyStories-02x0768_01d02 rhubarbwu/TinyStories-02x0768_01d12
rhubarbwu/TinyStories-02x0768_01d03 rhubarbwu/TinyStories-02x0768_01d13
rhubarbwu/TinyStories-02x0768_01d04 rhubarbwu/TinyStories-02x0768_01d14
rhubarbwu/TinyStories-02x0768_01d05 rhubarbwu/TinyStories-02x0768_01d15
rhubarbwu/TinyStories-02x0768_01d06 rhubarbwu/TinyStories-02x0768_01d16
rhubarbwu/TinyStories-02x0768_01d07 rhubarbwu/TinyStories-02x0768_01d17
rhubarbwu/TinyStories-02x0768_01d08 rhubarbwu/TinyStories-02x0768_01d18
rhubarbwu/TinyStories-02x0768_01d09 rhubarbwu/TinyStories-02x0768_01d19
Models trained with full weight decay (`b=0.1`).
rhubarbwu/TinyStories-01x0064_01L rhubarbwu/TinyStories-04x0512_01L
rhubarbwu/TinyStories-01x0064_10L rhubarbwu/TinyStories-04x0512_10L
rhubarbwu/TinyStories-01x0128_01L rhubarbwu/TinyStories-04x0768_01L
rhubarbwu/TinyStories-01x0128_10L rhubarbwu/TinyStories-04x0768_10L
rhubarbwu/TinyStories-01x0256_01L rhubarbwu/TinyStories-04x1024_01L
rhubarbwu/TinyStories-01x0256_10L rhubarbwu/TinyStories-04x1024_10L
rhubarbwu/TinyStories-01x0512_01L rhubarbwu/TinyStories-08x0064_01L
rhubarbwu/TinyStories-01x0512_10L rhubarbwu/TinyStories-08x0064_10L
rhubarbwu/TinyStories-01x0768_01L rhubarbwu/TinyStories-08x0128_01L
rhubarbwu/TinyStories-01x0768_10L rhubarbwu/TinyStories-08x0128_10L
rhubarbwu/TinyStories-01x1024_01L rhubarbwu/TinyStories-08x0256_01L
rhubarbwu/TinyStories-01x1024_10L rhubarbwu/TinyStories-08x0256_10L
rhubarbwu/TinyStories-02x0064_01L rhubarbwu/TinyStories-08x0512_01L
rhubarbwu/TinyStories-02x0064_10L rhubarbwu/TinyStories-08x0512_10L
rhubarbwu/TinyStories-02x0128_01L rhubarbwu/TinyStories-08x0768_01L
rhubarbwu/TinyStories-02x0128_10L rhubarbwu/TinyStories-08x0768_10L
rhubarbwu/TinyStories-02x0256_01L rhubarbwu/TinyStories-08x1024_01L
rhubarbwu/TinyStories-02x0256_10L rhubarbwu/TinyStories-08x1024_10L
rhubarbwu/TinyStories-02x0512_01L rhubarbwu/TinyStories-12x0064_01L
rhubarbwu/TinyStories-02x0512_10L rhubarbwu/TinyStories-12x0064_10L
rhubarbwu/TinyStories-02x0768_01L rhubarbwu/TinyStories-12x0128_01L
rhubarbwu/TinyStories-02x0768_10L rhubarbwu/TinyStories-12x0128_10L
rhubarbwu/TinyStories-02x1024_01L rhubarbwu/TinyStories-12x0256_01L
rhubarbwu/TinyStories-02x1024_10L rhubarbwu/TinyStories-12x0256_10L
rhubarbwu/TinyStories-04x0064_01L rhubarbwu/TinyStories-12x0512_01L
rhubarbwu/TinyStories-04x0064_10L rhubarbwu/TinyStories-12x0512_10L
rhubarbwu/TinyStories-04x0128_01L rhubarbwu/TinyStories-12x0768_01L
rhubarbwu/TinyStories-04x0128_10L rhubarbwu/TinyStories-12x0768_10L
rhubarbwu/TinyStories-04x0256_01L rhubarbwu/TinyStories-12x1024_01L
rhubarbwu/TinyStories-04x0256_10L rhubarbwu/TinyStories-12x1024_10L // exists
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support