BitNet2 Model with H-BitLinear Layers

This is a BitNet2 model that uses H-BitLinear layers for efficient computation. The model maintains the original BitNetModel2 architecture while being compatible with Hugging Face Transformers.

Model Details

  • Model Type: BitNet2 with H-BitLinear layers
  • Architecture: Transformer with H-BitLinear feed-forward networks
  • Parameters: ~414M parameters
  • Hidden Size: 512
  • Layers: 12
  • Attention Heads: 8
  • Intermediate Size: 2048
  • Vocabulary Size: 128,256
  • Max Sequence Length: 128

Key Features

  • H-BitLinear Layers: Uses Hadamard-based linear layers for improved efficiency
  • Hugging Face Compatible: Full compatibility with Hugging Face Transformers
  • Custom Architecture: Maintains the original BitNetModel2 structure
  • Optimized for Inference: Designed for fast text generation

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model and tokenizer
model_name = "YOUR_USERNAME/proper-bitnet2-model"
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Architecture Details

The model uses:

  • H-BitLinear Layers: Hadamard-based linear transformations for feed-forward networks
  • Multi-head Attention: Standard transformer attention mechanism
  • Layer Normalization: Applied before attention and feed-forward layers
  • GELU Activation: Used in feed-forward networks

Training

This model was trained using the BitNetModel2 architecture with H-BitLinear layers. The training process involved:

  • Custom training loop with layer skipping capabilities
  • H-BitLinear implementation for efficient computation
  • Optimized for both training and inference

Performance

The model is designed for:

  • Fast inference with early exit capabilities
  • Efficient memory usage through H-BitLinear layers
  • Compatible with standard Hugging Face pipelines

Citation

If you use this model, please cite the original BitNet paper and acknowledge the H-BitLinear implementation.

License

This model is released under the MIT License.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support