Bochkov commited on
Commit
c53cf58
Β·
verified Β·
1 Parent(s): d64632a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -69
README.md CHANGED
@@ -11,89 +11,44 @@ tags:
11
  - frozen-embeddings
12
  ---
13
 
14
- # Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate
15
 
16
- This repository contains the `bvv241-max` model, a key component and result from the paper **"Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate"** ([https://huggingface.co/papers/2507.07129](https://huggingface.co/papers/2507.07129)).
 
 
17
 
18
- This research introduces a novel, constructive approach to Large Language Model (LLM) development by building upon **non-trainable, deterministic input embeddings**. This fixed representational substrate acts as a universal "docking port," enabling two powerful and efficient scaling paradigms: seamless modular composition and progressive layer-wise growth. This approach facilitates:
19
- * **Modular Composition**: Specialist models trained on diverse datasets (e.g., different languages) can be merged into a single Mixture-of-Experts (MoE) model post-training, with zero architectural modification.
20
- * **Progressive Layer-wise Growth**: Deep Transformers can be "grown" by progressively stacking and training one layer at a time, showing stable convergence and correlation between depth and reasoning abilities.
21
 
22
- The project aims for resource-efficient scaling, continual learning, and a more democratized ecosystem for building powerful AI systems.
23
-
24
- Find the official code and more resources on the [GitHub repository](https://github.com/Bochkov/bvv241-max).
25
-
26
- ## Quick Start
27
-
28
- You can use the model with the `transformers` library. Note that this model uses a custom architecture, so `trust_remote_code=True` is required for proper loading. This model also utilizes precomputed, frozen embeddings that are loaded separately.
29
 
 
30
  ```python
31
- from transformers import AutoTokenizer, AutoModelForCausalLM
32
- from huggingface_hub import hf_hub_download
33
  import torch
34
-
35
- # Load tokenizer
36
- tokenizer = AutoTokenizer.from_pretrained('Bochkov/bvv241-max', trust_remote_code=True)
37
-
38
- # Load the precomputed, frozen embedding matrix
39
- # This model uses a unique approach where embeddings are fixed and loaded separately.
40
- emb_path = hf_hub_download(repo_id="Bochkov/bvv241-max", filename="normalized_embeddings_weights.pt")
41
- embeddings = torch.load(emb_path) # shape: [vocab_size, emb_dim]
42
-
43
- # Load the model. You will typically need to initialize the model's embedding layer
44
- # with the 'embeddings' loaded above, depending on the model's specific
45
- # 'forward' method or initialization.
46
- model = AutoModelForCausalLM.from_pretrained(
47
- 'Bochkov/bvv241-max',
48
- torch_dtype=torch.float32, # or torch.bfloat16, depending on your setup
49
- low_cpu_mem_usage=True,
50
- trust_remote_code=True
51
  )
52
- # Example of how you might assign the embeddings if the model's embedding layer is named 'wte':
53
- # model.transformer.wte.weight = torch.nn.Parameter(embeddings).to(model.device)
54
- model.eval() # Set to evaluation mode
55
-
56
- # Example text generation
57
- prompt = "The key to life is"
58
- input_ids = tokenizer.encode(prompt, return_tensors="pt")
59
-
60
- # Move to GPU if available
61
- if torch.cuda.is_available():
62
- model.to("cuda")
63
- input_ids = input_ids.to("cuda")
64
-
65
- # Generate text
66
- output_ids = model.generate(input_ids, max_new_tokens=20, do_sample=True, temperature=0.7)
67
- generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
68
-
69
- print(generated_text)
70
  ```
71
 
72
- ## Tokenizer and Embedding Variants
73
- This repository provides various Unicode-based tokenizers and precomputed, L2-normalized, frozen embedding matrices for direct use in `nn.Embedding`. These embeddings contain **no semantic information** and are designed for research into emergent semantics in transformer layers.
74
-
75
- 1. **`bvv241-2-3`**: Base Unicode plane (0–65535) with Wikipedia bigrams/trigrams in private Unicode ranges. (65,536 tokens, 1024-dim frozen embedding)
76
- 2. **`bvv241-max` (This Model)**: Combines Unicode monograms + bigrams/trigrams + intersection of token strings from SOTA models. (131,072 tokens, 1024-dim frozen embedding)
77
- 3. **`bvv241-nemo`**: Vocabulary of Mistral-Nemo SOTA model with frozen surface-level embeddings. (131,072 tokens, 1024-dim frozen embedding)
78
- 4. **`bvv241-abs`**: Similar to `bvv241-max`, but with an embedding size of 4096.
79
-
80
- These variants are designed to enable flexible experimentation with modular model fusion and the study of semantic emergence in LLMs.
81
-
82
  ## Citation
83
 
84
  If you find this work helpful or inspiring, please consider citing the associated papers:
85
 
86
  ```bibtex
87
- @misc{bochkov2025growingtransformersmodularcomposition,
88
- title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
89
- author={A. Bochkov},
90
- year={2025},
91
- eprint={2507.07129},
92
- archivePrefix={arXiv},
93
- primaryClass={cs.LG},
94
- url={https://arxiv.org/abs/2507.07129},
95
- }
96
-
97
  @misc{bochkov2025emergentsemanticstokenembeddings,
98
  title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
99
  author={A. Bochkov},
@@ -103,4 +58,14 @@ If you find this work helpful or inspiring, please consider citing the associate
103
  primaryClass={cs.CL},
104
  url={https://arxiv.org/abs/2507.04886},
105
  }
 
 
 
 
 
 
 
 
 
 
106
  ```
 
11
  - frozen-embeddings
12
  ---
13
 
14
+ # best_bvv_unfrozen_zh
15
 
16
+ [πŸ“š Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations)](https://huggingface.co/papers/2507.04886) -
17
+ [πŸ“š Paper (Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate)](https://huggingface.co/papers/2507.07129) -
18
+ [πŸ’» Code](https://github.com/AVBochkov/Embeddings)
19
 
20
+ # Model summary
21
+ best_bvv_unfrozen_zh is a 500M parameter Causal Language Model (LM) trained as an open proof-of-concept for the "frozen embeddings" paradigm. This version uses fully trainable token embeddings – a standard setup – and serves as a baseline for direct comparison with the corresponding "frozen-embedding" model Bochkov/best_bvv_zh.
 
22
 
23
+ Architecture: Transformer, rotary positional encoding
24
+ Vocabulary: Custom Unicode-based, 131072 tokens
25
+ Embedding: Unfrozen (trainable, classic)
26
+ Pretraining data: 9B tokens, (Wikipedia, SQuAD2.0, TriviaQA, NQ etc) and 10% SFT (instruction/factual Q&A) mixed in
27
+ Purpose: Compare learning capacity and generalization of full vs. frozen-embedding LMs on small data
 
 
28
 
29
+ ## Example Usage
30
  ```python
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
 
32
  import torch
33
+ model = AutoModelForCausalLM.from_pretrained('Bochkov/best_bvv_unfrozen_zh', trust_remote_code=True).to('cuda')
34
+ tokenizer = AutoTokenizer.from_pretrained('Bochkov/best_bvv_unfrozen_zh')
35
+ inputs = tokenizer("Hello, world! ", return_tensors="pt").to('cuda')
36
+ outputs = model.generate(
37
+ **inputs,
38
+ max_new_tokens=100,
39
+ temperature=0.8,
40
+ top_k=50,
41
+ top_p=0.95,
42
+ do_sample=True
 
 
 
 
 
 
 
43
  )
44
+ print(tokenizer.decode(outputs[0]))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ```
46
 
 
 
 
 
 
 
 
 
 
 
47
  ## Citation
48
 
49
  If you find this work helpful or inspiring, please consider citing the associated papers:
50
 
51
  ```bibtex
 
 
 
 
 
 
 
 
 
 
52
  @misc{bochkov2025emergentsemanticstokenembeddings,
53
  title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
54
  author={A. Bochkov},
 
58
  primaryClass={cs.CL},
59
  url={https://arxiv.org/abs/2507.04886},
60
  }
61
+
62
+ @misc{bochkov2025growingtransformersmodularcomposition,
63
+ title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
64
+ author={A. Bochkov},
65
+ year={2025},
66
+ eprint={2507.07129},
67
+ archivePrefix={arXiv},
68
+ primaryClass={cs.LG},
69
+ url={https://arxiv.org/abs/2507.07129},
70
+ }
71
  ```