---
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- bvv
- non-frozen
- embedding
- research
- baseline
library_name: transformers
---

# pro_bvv_unfrozen: 200M baseline LM (non-frozen embeddings)

This repository contains the model and associated resources from the papers
[📚 Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations)](https://huggingface.co/papers/2507.04886) - 
[📚 Paper (Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate)](https://huggingface.co/papers/2507.07129) - 
[💻 Code](https://github.com/AVBochkov/Embeddings)

**Description**

This is a baseline English language model (200M parameters) trained in the **classical** way with **fully trainable** token embeddings, **provided for direct comparison** with the conceptually frozen-embedding variant.

**Training details**

- English corpus (~9B tokens), 10% SFT mixed-in.
- All layers, including token embeddings, are trainable.
- Hyperparameters and architecture match pro_bvv_en.

**Evaluation**

| Task     | pro_bvv_unfrozen |
|----------|------------------|
| MMLU     | 14.00% ± 0.14%   |
| ARC-e    | 24.09% ± 0.78%   |
| ARC-c    | 22.24% ± 1.04%   |
| C-SENSE  | 19.76% ± 0.52%   |
| SQUAD    | 13.28% ± 0.93%   |


---
⚠️ Limitations
Research use only.
Trained on a small subset.
Quality, robustness, and reasoning are much lower than SOTA models.
SFT was only lightly applied; not intended for real world use.

## 🧑‍🔬 Citation & Concept

If you use this model or the underlying concepts in your research, please cite our work:

```
@misc{bochkov2025emergentsemanticstokenembeddings,
      title={Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations}, 
      author={A. Bochkov},
      year={2025},
      eprint={2507.04886},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.04886}, 
}
@misc{bochkov2025growingtransformersmodularcomposition,
      title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate}, 
      author={A. Bochkov},
      year={2025},
      eprint={2507.07129},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.07129}, 
}
```

This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.

**Usage**

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained('Bochkov/pro_bvv_unfrozen')
model = AutoModelForCausalLM.from_pretrained('Bochkov/pro_bvv_unfrozen', trust_remote_code=True).to('cuda')
inputs = torch.tensor([tokenizer.encode("Example input: ")], device='cuda')
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```