BitNet2 Model with H-BitLinear Layers
This is a BitNet2 model that uses H-BitLinear layers for efficient computation. The model maintains the original BitNetModel2 architecture while being compatible with Hugging Face Transformers.
Model Details
- Model Type: BitNet2 with H-BitLinear layers
- Architecture: Transformer with H-BitLinear feed-forward networks
- Parameters: ~414M parameters
- Hidden Size: 512
- Layers: 12
- Attention Heads: 8
- Intermediate Size: 2048
- Vocabulary Size: 128,256
- Max Sequence Length: 128
Key Features
- H-BitLinear Layers: Uses Hadamard-based linear layers for improved efficiency
- Hugging Face Compatible: Full compatibility with Hugging Face Transformers
- Custom Architecture: Maintains the original BitNetModel2 structure
- Optimized for Inference: Designed for fast text generation
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
model_name = "YOUR_USERNAME/proper-bitnet2-model"
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Architecture Details
The model uses:
- H-BitLinear Layers: Hadamard-based linear transformations for feed-forward networks
- Multi-head Attention: Standard transformer attention mechanism
- Layer Normalization: Applied before attention and feed-forward layers
- GELU Activation: Used in feed-forward networks
Training
This model was trained using the BitNetModel2 architecture with H-BitLinear layers. The training process involved:
- Custom training loop with layer skipping capabilities
- H-BitLinear implementation for efficient computation
- Optimized for both training and inference
Performance
The model is designed for:
- Fast inference with early exit capabilities
- Efficient memory usage through H-BitLinear layers
- Compatible with standard Hugging Face pipelines
Citation
If you use this model, please cite the original BitNet paper and acknowledge the H-BitLinear implementation.
License
This model is released under the MIT License.
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support