Bochkov commited on
Commit
cbe258a
·
verified ·
1 Parent(s): 57116dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -16,6 +16,7 @@ tags:
16
  ## Model Description
17
 
18
  `abs-bvv-2` is a 1.5 billion parameter decoder-only Transformer model. It is the second model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
 
19
 
20
  This model was not trained monolithically. Instead, it was "grown" constructively, one layer at a time, upon a foundation of **frozen, non-semantic visual embeddings**, as introduced in the paper "[Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations](https://arxiv.org/abs/2507.04886)".
21
 
@@ -26,7 +27,7 @@ The core idea is to demonstrate an alternative, more modular and resource-effici
26
 
27
  `abs-bvv-2` represents the state of the model after 2 layers of progressive training. It has 2 Transformer blocks, a hidden dimension of 4096, and uses the `bvv241` tokenizer family.
28
 
29
- **Code:** [https://github.com/Bochkov/bvv241](https://github.com/Bochkov/bvv241)
30
 
31
  ## Intended Use
32
 
 
16
  ## Model Description
17
 
18
  `abs-bvv-2` is a 1.5 billion parameter decoder-only Transformer model. It is the second model in the **Progressive Growth Transformers (PGT)** series, designed to explore how linguistic and reasoning capabilities emerge as a function of model depth.
19
+ This model is presented in the paper [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://huggingface.co/papers/2507.07129).
20
 
21
  This model was not trained monolithically. Instead, it was "grown" constructively, one layer at a time, upon a foundation of **frozen, non-semantic visual embeddings**, as introduced in the paper "[Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations](https://arxiv.org/abs/2507.04886)".
22
 
 
27
 
28
  `abs-bvv-2` represents the state of the model after 2 layers of progressive training. It has 2 Transformer blocks, a hidden dimension of 4096, and uses the `bvv241` tokenizer family.
29
 
30
+ **Code:** [https://github.com/AVBochkov/PGT](https://github.com/AVBochkov/PGT)
31
 
32
  ## Intended Use
33