You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

TinyStories (GPT-Neo)

Model set for Linguistic Collapse: Neural Collapse in (Large) Language Models (NeurIPS 2024, arXiv:2405.17767), leveraging neural-collapse, GPT-Neo and TinyStories.

Code found at https://github.com/rhubarbwu/linguistic-collapse/.

Past Model Family Paths

In running experiments for the paper, models were named on an ephemeral bases in the following format LLxdddd_EEb, where LL is the number of layers; dddd is the hidden dimension; EE is the number of epochs; and b is the level of regularization. Most of these models no longer exist on Huggingface.

Models trained with no weight decay (`b=0`).
rhubarbwu/TinyStories-01x0064_01n  rhubarbwu/TinyStories-04x0512_01n
rhubarbwu/TinyStories-01x0064_10n  rhubarbwu/TinyStories-04x0512_10n
rhubarbwu/TinyStories-01x0128_01n  rhubarbwu/TinyStories-04x0768_01n
rhubarbwu/TinyStories-01x0128_10n  rhubarbwu/TinyStories-04x0768_10n
rhubarbwu/TinyStories-01x0256_01n  rhubarbwu/TinyStories-04x1024_01n
rhubarbwu/TinyStories-01x0256_10n  rhubarbwu/TinyStories-04x1024_10n
rhubarbwu/TinyStories-01x0512_01n  rhubarbwu/TinyStories-08x0064_01n
rhubarbwu/TinyStories-01x0512_10n  rhubarbwu/TinyStories-08x0064_10n
rhubarbwu/TinyStories-01x0768_01n  rhubarbwu/TinyStories-08x0128_01n
rhubarbwu/TinyStories-01x0768_10n  rhubarbwu/TinyStories-08x0128_10n
rhubarbwu/TinyStories-01x1024_01n  rhubarbwu/TinyStories-08x0256_01n
rhubarbwu/TinyStories-01x1024_10n  rhubarbwu/TinyStories-08x0256_10n
rhubarbwu/TinyStories-02x0064_01n  rhubarbwu/TinyStories-08x0512_01n
rhubarbwu/TinyStories-02x0064_10n  rhubarbwu/TinyStories-08x0512_10n
rhubarbwu/TinyStories-02x0128_01n  rhubarbwu/TinyStories-08x0768_01n
rhubarbwu/TinyStories-02x0128_10n  rhubarbwu/TinyStories-08x0768_10n
rhubarbwu/TinyStories-02x0256_01n  rhubarbwu/TinyStories-08x1024_01n
rhubarbwu/TinyStories-02x0256_10n  rhubarbwu/TinyStories-08x1024_10n
rhubarbwu/TinyStories-02x0512_01n  rhubarbwu/TinyStories-12x0064_01n
rhubarbwu/TinyStories-02x0512_10n  rhubarbwu/TinyStories-12x0064_10n
rhubarbwu/TinyStories-02x0768_01n  rhubarbwu/TinyStories-12x0128_01n
rhubarbwu/TinyStories-02x0768_10n  rhubarbwu/TinyStories-12x0128_10n
rhubarbwu/TinyStories-02x1024_01n  rhubarbwu/TinyStories-12x0256_01n
rhubarbwu/TinyStories-02x1024_10n  rhubarbwu/TinyStories-12x0256_10n
rhubarbwu/TinyStories-04x0064_01n  rhubarbwu/TinyStories-12x0512_01n
rhubarbwu/TinyStories-04x0064_10n  rhubarbwu/TinyStories-12x0512_10n
rhubarbwu/TinyStories-04x0128_01n  rhubarbwu/TinyStories-12x0768_01n
rhubarbwu/TinyStories-04x0128_10n  rhubarbwu/TinyStories-12x0768_10n
rhubarbwu/TinyStories-04x0256_01n  rhubarbwu/TinyStories-12x1024_01n
rhubarbwu/TinyStories-04x0256_10n  rhubarbwu/TinyStories-12x1024_10n
Models trained minimal weight decay (`b=0.0005`).
rhubarbwu/TinyStories-01x0064_01d  rhubarbwu/TinyStories-04x0512_01d
rhubarbwu/TinyStories-01x0064_10d  rhubarbwu/TinyStories-04x0512_10d
rhubarbwu/TinyStories-01x0128_01d  rhubarbwu/TinyStories-04x0768_01d
rhubarbwu/TinyStories-01x0128_10d  rhubarbwu/TinyStories-04x0768_10d
rhubarbwu/TinyStories-01x0256_01d  rhubarbwu/TinyStories-04x1024_01d
rhubarbwu/TinyStories-01x0256_10d  rhubarbwu/TinyStories-04x1024_10d
rhubarbwu/TinyStories-01x0512_01d  rhubarbwu/TinyStories-08x0064_01d
rhubarbwu/TinyStories-01x0512_10d  rhubarbwu/TinyStories-08x0064_10d
rhubarbwu/TinyStories-01x0768_01d  rhubarbwu/TinyStories-08x0128_01d
rhubarbwu/TinyStories-01x0768_10d  rhubarbwu/TinyStories-08x0128_10d
rhubarbwu/TinyStories-01x1024_01d  rhubarbwu/TinyStories-08x0256_01d
rhubarbwu/TinyStories-01x1024_10d  rhubarbwu/TinyStories-08x0256_10d
rhubarbwu/TinyStories-02x0064_01d  rhubarbwu/TinyStories-08x0512_01d
rhubarbwu/TinyStories-02x0064_10d  rhubarbwu/TinyStories-08x0512_10d
rhubarbwu/TinyStories-02x0128_01d  rhubarbwu/TinyStories-08x0768_01d
rhubarbwu/TinyStories-02x0128_10d  rhubarbwu/TinyStories-08x0768_10d
rhubarbwu/TinyStories-02x0256_01d  rhubarbwu/TinyStories-08x1024_01d
rhubarbwu/TinyStories-02x0256_10d  rhubarbwu/TinyStories-08x1024_10d
rhubarbwu/TinyStories-02x0512_01d  rhubarbwu/TinyStories-12x0064_01d
rhubarbwu/TinyStories-02x0512_10d  rhubarbwu/TinyStories-12x0064_10d
rhubarbwu/TinyStories-02x0768_01d  rhubarbwu/TinyStories-12x0128_01d
rhubarbwu/TinyStories-02x0768_10d  rhubarbwu/TinyStories-12x0128_10d
rhubarbwu/TinyStories-02x1024_01d  rhubarbwu/TinyStories-12x0256_01d
rhubarbwu/TinyStories-02x1024_10d  rhubarbwu/TinyStories-12x0256_10d
rhubarbwu/TinyStories-04x0064_01d  rhubarbwu/TinyStories-12x0512_01d
rhubarbwu/TinyStories-04x0064_10d  rhubarbwu/TinyStories-12x0512_10d
rhubarbwu/TinyStories-04x0128_01d  rhubarbwu/TinyStories-12x0768_01d
rhubarbwu/TinyStories-04x0128_10d  rhubarbwu/TinyStories-12x0768_10d
rhubarbwu/TinyStories-04x0256_01d  rhubarbwu/TinyStories-12x1024_01d
rhubarbwu/TinyStories-04x0256_10d  rhubarbwu/TinyStories-12x1024_10d

// duplicates of TinyStories-02x0768_01d for permutation test
rhubarbwu/TinyStories-02x0768_01d00  rhubarbwu/TinyStories-02x0768_01d10
rhubarbwu/TinyStories-02x0768_01d01  rhubarbwu/TinyStories-02x0768_01d11
rhubarbwu/TinyStories-02x0768_01d02  rhubarbwu/TinyStories-02x0768_01d12
rhubarbwu/TinyStories-02x0768_01d03  rhubarbwu/TinyStories-02x0768_01d13
rhubarbwu/TinyStories-02x0768_01d04  rhubarbwu/TinyStories-02x0768_01d14
rhubarbwu/TinyStories-02x0768_01d05  rhubarbwu/TinyStories-02x0768_01d15
rhubarbwu/TinyStories-02x0768_01d06  rhubarbwu/TinyStories-02x0768_01d16
rhubarbwu/TinyStories-02x0768_01d07  rhubarbwu/TinyStories-02x0768_01d17
rhubarbwu/TinyStories-02x0768_01d08  rhubarbwu/TinyStories-02x0768_01d18
rhubarbwu/TinyStories-02x0768_01d09  rhubarbwu/TinyStories-02x0768_01d19
Models trained with full weight decay (`b=0.1`).
rhubarbwu/TinyStories-01x0064_01L  rhubarbwu/TinyStories-04x0512_01L
rhubarbwu/TinyStories-01x0064_10L  rhubarbwu/TinyStories-04x0512_10L
rhubarbwu/TinyStories-01x0128_01L  rhubarbwu/TinyStories-04x0768_01L
rhubarbwu/TinyStories-01x0128_10L  rhubarbwu/TinyStories-04x0768_10L
rhubarbwu/TinyStories-01x0256_01L  rhubarbwu/TinyStories-04x1024_01L
rhubarbwu/TinyStories-01x0256_10L  rhubarbwu/TinyStories-04x1024_10L
rhubarbwu/TinyStories-01x0512_01L  rhubarbwu/TinyStories-08x0064_01L
rhubarbwu/TinyStories-01x0512_10L  rhubarbwu/TinyStories-08x0064_10L
rhubarbwu/TinyStories-01x0768_01L  rhubarbwu/TinyStories-08x0128_01L
rhubarbwu/TinyStories-01x0768_10L  rhubarbwu/TinyStories-08x0128_10L
rhubarbwu/TinyStories-01x1024_01L  rhubarbwu/TinyStories-08x0256_01L
rhubarbwu/TinyStories-01x1024_10L  rhubarbwu/TinyStories-08x0256_10L
rhubarbwu/TinyStories-02x0064_01L  rhubarbwu/TinyStories-08x0512_01L
rhubarbwu/TinyStories-02x0064_10L  rhubarbwu/TinyStories-08x0512_10L
rhubarbwu/TinyStories-02x0128_01L  rhubarbwu/TinyStories-08x0768_01L
rhubarbwu/TinyStories-02x0128_10L  rhubarbwu/TinyStories-08x0768_10L
rhubarbwu/TinyStories-02x0256_01L  rhubarbwu/TinyStories-08x1024_01L
rhubarbwu/TinyStories-02x0256_10L  rhubarbwu/TinyStories-08x1024_10L
rhubarbwu/TinyStories-02x0512_01L  rhubarbwu/TinyStories-12x0064_01L
rhubarbwu/TinyStories-02x0512_10L  rhubarbwu/TinyStories-12x0064_10L
rhubarbwu/TinyStories-02x0768_01L  rhubarbwu/TinyStories-12x0128_01L
rhubarbwu/TinyStories-02x0768_10L  rhubarbwu/TinyStories-12x0128_10L
rhubarbwu/TinyStories-02x1024_01L  rhubarbwu/TinyStories-12x0256_01L
rhubarbwu/TinyStories-02x1024_10L  rhubarbwu/TinyStories-12x0256_10L
rhubarbwu/TinyStories-04x0064_01L  rhubarbwu/TinyStories-12x0512_01L
rhubarbwu/TinyStories-04x0064_10L  rhubarbwu/TinyStories-12x0512_10L
rhubarbwu/TinyStories-04x0128_01L  rhubarbwu/TinyStories-12x0768_01L
rhubarbwu/TinyStories-04x0128_10L  rhubarbwu/TinyStories-12x0768_10L
rhubarbwu/TinyStories-04x0256_01L  rhubarbwu/TinyStories-12x1024_01L
rhubarbwu/TinyStories-04x0256_10L  rhubarbwu/TinyStories-12x1024_10L // exists
Downloads last month
0
Safetensors
Model size
205M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rhubarbwu/TinyStories-12x1024_10L