Llama3-8B-1.58
Collection
A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture!
•
3 items
•
Updated
•
12
The Llama3-8B-1.58 models are large language models fine-tuned on the BitNet 1.58b architecture, starting from the base model Llama-3-8B-Instruct.
For a deeper dive into the methods and results, check out our blog post.
You can easily load and test our model in Transformers. Just follow the code below:
Start by installing the transformers version with the correct configuration to load bitnet models
pip install git+https://github.com/huggingface/transformers.git@refs/pull/33410/head
And then load the model :
model = AutoModelForCausalLM.from_pretrained("HF1BitLLM/Llama3-8B-1.58-Linear-10B-tokens", device_map="cuda", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
input_text = "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra went to the hallway. John went to the bedroom. Mary went back to the garden. Where is Mary?\nAnswer:"
input_ids = tokenizer.encode(input_text, return_tensors="pt").cuda()
output = model.generate(input_ids, max_length=10, do_sample=False)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
The model was trained on a subset of FineWeb-edu
Starting Point
Training Duration
Dataset
Batch Size
Lambda Scheduler
1 / (1 + exp(-k * (step / 1000 - 0.5)))
Learning Rate
Performance
Evaluation
Quantization
Key Findings
The evaluation of the models is done on the nanotron checkpoints using LightEval :
@misc{,
title={1.58-Bit LLM: A New Era of Extreme Quantization},
author={Mohamed Mekkouri and Marc Sun and Leandro von Werra and Thomas Wolf},
year={2024},
}
Base model
meta-llama/Meta-Llama-3-8B-Instruct