Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,7 @@ tags:
|
|
16 |
## Model Description
|
17 |
|
18 |
`abs-bvv-2` is a 1.5 billion parameter decoder-only Transformer model. It is the second model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
|
|
|
19 |
|
20 |
This model was not trained monolithically. Instead, it was "grown" constructively, one layer at a time, upon a foundation of **frozen, non-semantic visual embeddings**, as introduced in the paper "[Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations](https://arxiv.org/abs/2507.04886)".
|
21 |
|
@@ -26,7 +27,7 @@ The core idea is to demonstrate an alternative, more modular and resource-effici
|
|
26 |
|
27 |
`abs-bvv-2` represents the state of the model after 2 layers of progressive training. It has 2 Transformer blocks, a hidden dimension of 4096, and uses the `bvv241` tokenizer family.
|
28 |
|
29 |
-
**Code:** [https://github.com/
|
30 |
|
31 |
## Intended Use
|
32 |
|
|
|
16 |
## Model Description
|
17 |
|
18 |
`abs-bvv-2` is a 1.5 billion parameter decoder-only Transformer model. It is the second model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
|
19 |
+
This model is presented in the paper [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://huggingface.co/papers/2507.07129).
|
20 |
|
21 |
This model was not trained monolithically. Instead, it was "grown" constructively, one layer at a time, upon a foundation of **frozen, non-semantic visual embeddings**, as introduced in the paper "[Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations](https://arxiv.org/abs/2507.04886)".
|
22 |
|
|
|
27 |
|
28 |
`abs-bvv-2` represents the state of the model after 2 layers of progressive training. It has 2 Transformer blocks, a hidden dimension of 4096, and uses the `bvv241` tokenizer family.
|
29 |
|
30 |
+
**Code:** [https://github.com/AVBochkov/PGT](https://github.com/AVBochkov/PGT)
|
31 |
|
32 |
## Intended Use
|
33 |
|