Bochkov
/

best_bvv_unfrozen_zh

@@ -11,89 +11,44 @@ tags:
   - frozen-embeddings
 ---
-# Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate
-This repository contains the `bvv241-max` model, a key component and result from the paper **"Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate"** ([https://huggingface.co/papers/2507.07129](https://huggingface.co/papers/2507.07129)).
-This research introduces a novel, constructive approach to Large Language Model (LLM) development by building upon **non-trainable, deterministic input embeddings**. This fixed representational substrate acts as a universal "docking port," enabling two powerful and efficient scaling paradigms: seamless modular composition and progressive layer-wise growth. This approach facilitates:
-*   **Modular Composition**: Specialist models trained on diverse datasets (e.g., different languages) can be merged into a single Mixture-of-Experts (MoE) model post-training, with zero architectural modification.
-*   **Progressive Layer-wise Growth**: Deep Transformers can be "grown" by progressively stacking and training one layer at a time, showing stable convergence and correlation between depth and reasoning abilities.
-The project aims for resource-efficient scaling, continual learning, and a more democratized ecosystem for building powerful AI systems.
-Find the official code and more resources on the [GitHub repository](https://github.com/Bochkov/bvv241-max).
-## Quick Start
-You can use the model with the `transformers` library. Note that this model uses a custom architecture, so `trust_remote_code=True` is required for proper loading. This model also utilizes precomputed, frozen embeddings that are loaded separately.
 ```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-from huggingface_hub import hf_hub_download
 import torch
-# Load tokenizer
-tokenizer = AutoTokenizer.from_pretrained('Bochkov/bvv241-max', trust_remote_code=True)
-# Load the precomputed, frozen embedding matrix
-# This model uses a unique approach where embeddings are fixed and loaded separately.
-emb_path = hf_hub_download(repo_id="Bochkov/bvv241-max", filename="normalized_embeddings_weights.pt")
-embeddings = torch.load(emb_path) # shape: [vocab_size, emb_dim]
-# Load the model. You will typically need to initialize the model's embedding layer
-# with the 'embeddings' loaded above, depending on the model's specific
-# 'forward' method or initialization.
-model = AutoModelForCausalLM.from_pretrained(
-    'Bochkov/bvv241-max',
-    torch_dtype=torch.float32, # or torch.bfloat16, depending on your setup
-    low_cpu_mem_usage=True,
-    trust_remote_code=True
 )
-# Example of how you might assign the embeddings if the model's embedding layer is named 'wte':
-# model.transformer.wte.weight = torch.nn.Parameter(embeddings).to(model.device)
-model.eval() # Set to evaluation mode
-# Example text generation
-prompt = "The key to life is"
-input_ids = tokenizer.encode(prompt, return_tensors="pt")
-# Move to GPU if available
-if torch.cuda.is_available():
-    model.to("cuda")
-    input_ids = input_ids.to("cuda")
-# Generate text
-output_ids = model.generate(input_ids, max_new_tokens=20, do_sample=True, temperature=0.7)
-generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
-print(generated_text)
 ```
-## Tokenizer and Embedding Variants
-This repository provides various Unicode-based tokenizers and precomputed, L2-normalized, frozen embedding matrices for direct use in `nn.Embedding`. These embeddings contain **no semantic information** and are designed for research into emergent semantics in transformer layers.
-1.  **`bvv241-2-3`**: Base Unicode plane (0–65535) with Wikipedia bigrams/trigrams in private Unicode ranges. (65,536 tokens, 1024-dim frozen embedding)
-2.  **`bvv241-max` (This Model)**: Combines Unicode monograms + bigrams/trigrams + intersection of token strings from SOTA models. (131,072 tokens, 1024-dim frozen embedding)
-3.  **`bvv241-nemo`**: Vocabulary of Mistral-Nemo SOTA model with frozen surface-level embeddings. (131,072 tokens, 1024-dim frozen embedding)
-4.  **`bvv241-abs`**: Similar to `bvv241-max`, but with an embedding size of 4096.
-These variants are designed to enable flexible experimentation with modular model fusion and the study of semantic emergence in LLMs.
 ## Citation
 If you find this work helpful or inspiring, please consider citing the associated papers:
 ```bibtex
-@misc{bochkov2025growingtransformersmodularcomposition,
-      title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
-      author={A. Bochkov},
-      year={2025},
-      eprint={2507.07129},
-      archivePrefix={arXiv},
-      primaryClass={cs.LG},
-      url={https://arxiv.org/abs/2507.07129},
-}
 @misc{bochkov2025emergentsemanticstokenembeddings,
       title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
       author={A. Bochkov},
@@ -103,4 +58,14 @@ If you find this work helpful or inspiring, please consider citing the associate
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2507.04886},
 }
 ```

   - frozen-embeddings
 ---
+# best_bvv_unfrozen_zh
+[📚 Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations)](https://huggingface.co/papers/2507.04886) -
+[📚 Paper (Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate)](https://huggingface.co/papers/2507.07129) -
+[💻 Code](https://github.com/AVBochkov/Embeddings)
+# Model summary
+best_bvv_unfrozen_zh is a 500M parameter Causal Language Model (LM) trained as an open proof-of-concept for the "frozen embeddings" paradigm. This version uses fully trainable token embeddings – a standard setup – and serves as a baseline for direct comparison with the corresponding "frozen-embedding" model Bochkov/best_bvv_zh.
+Architecture: Transformer, rotary positional encoding
+Vocabulary: Custom Unicode-based, 131072 tokens
+Embedding: Unfrozen (trainable, classic)
+Pretraining data: 9B tokens, (Wikipedia, SQuAD2.0, TriviaQA, NQ etc) and 10% SFT (instruction/factual Q&A) mixed in
+Purpose: Compare learning capacity and generalization of full vs. frozen-embedding LMs on small data
+## Example Usage
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+model = AutoModelForCausalLM.from_pretrained('Bochkov/best_bvv_unfrozen_zh', trust_remote_code=True).to('cuda')
+tokenizer = AutoTokenizer.from_pretrained('Bochkov/best_bvv_unfrozen_zh')
+inputs = tokenizer("Hello, world! ", return_tensors="pt").to('cuda')
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=100,
+    temperature=0.8,
+    top_k=50,
+    top_p=0.95,
+    do_sample=True
 )
+print(tokenizer.decode(outputs[0]))
 ```
 ## Citation
 If you find this work helpful or inspiring, please consider citing the associated papers:
 ```bibtex
 @misc{bochkov2025emergentsemanticstokenembeddings,
       title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations},
       author={A. Bochkov},
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2507.04886},
 }
+@misc{bochkov2025growingtransformersmodularcomposition,
+      title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
+      author={A. Bochkov},
+      year={2025},
+      eprint={2507.07129},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2507.07129},
+}
 ```