BitNet2 Model with H-BitLinear Layers

This is a BitNet2 model that uses H-BitLinear layers for efficient computation. The model maintains the original BitNetModel2 architecture while being compatible with Hugging Face Transformers.

Model Details

Model Type: BitNet2 with H-BitLinear layers
Architecture: Transformer with H-BitLinear feed-forward networks
Parameters: ~414M parameters
Hidden Size: 512
Layers: 12
Attention Heads: 8
Intermediate Size: 2048
Vocabulary Size: 128,256
Max Sequence Length: 128

Key Features

H-BitLinear Layers: Uses Hadamard-based linear layers for improved efficiency
Hugging Face Compatible: Full compatibility with Hugging Face Transformers
Custom Architecture: Maintains the original BitNetModel2 structure
Optimized for Inference: Designed for fast text generation

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model_name = "YOUR_USERNAME/proper-bitnet2-model"
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Architecture Details

The model uses:

H-BitLinear Layers: Hadamard-based linear transformations for feed-forward networks
Multi-head Attention: Standard transformer attention mechanism
Layer Normalization: Applied before attention and feed-forward layers
GELU Activation: Used in feed-forward networks

Training

This model was trained using the BitNetModel2 architecture with H-BitLinear layers. The training process involved:

Custom training loop with layer skipping capabilities
H-BitLinear implementation for efficient computation
Optimized for both training and inference

Performance

The model is designed for:

Fast inference with early exit capabilities
Efficient memory usage through H-BitLinear layers
Compatible with standard Hugging Face pipelines

Citation

If you use this model, please cite the original BitNet paper and acknowledge the H-BitLinear implementation.

License

This model is released under the MIT License.