metadata
license: other
tags:
- merge
- mergekit
- lazymergekit
- autoquant
- exl2
base_model:
- meta-llama/Meta-Llama-3-70B-Instruct
- meta-llama/Meta-Llama-3-70B-Instruct
- meta-llama/Meta-Llama-3-70B-Instruct
- meta-llama/Meta-Llama-3-70B-Instruct
- meta-llama/Meta-Llama-3-70B-Instruct
- meta-llama/Meta-Llama-3-70B-Instruct
- meta-llama/Meta-Llama-3-70B-Instruct
Meta-Llama-3-120B-Instruct
Meta-Llama-3-120B-Instruct is a self-merge with meta-llama/Meta-Llama-3-70B-Instruct.
It was inspired by large merges like:
- alpindale/goliath-120b
- nsfwthrowitaway69/Venus-120b-v1.0
- cognitivecomputations/MegaDolphin-120b
- wolfram/miquliz-120b-v2.0.
π Applications
I recommend using this model for creative writing. It uses the Llama 3 chat template with a default context window of 8K (can be extended with rope theta).
Check the examples in the evaluation section to get an idea of its performance.
β‘ Quantized models
Thanks to Eric Hartford, elinas, and the mlx-community for providing these models.
- GGUF: https://huggingface.co/cognitivecomputations/Meta-Llama-3-120B-Instruct-gguf
- EXL2: https://huggingface.co/elinas/Meta-Llama-3-120B-Instruct-4.0bpw-exl2
- mlx: https://huggingface.co/mlx-community/Meta-Llama-3-120B-Instruct-4bit
π Evaluation
The model looks excellent for creating writing tasks, outperforming GPT-4. Thanks again to Eric Hartford for noticing this.
- X thread by Eric Hartford (creative writing): https://twitter.com/erhartford/status/1787050962114207886
- X thread by Daniel Kaiser (creative writing): https://twitter.com/spectate_or/status/1787257261309518101
- X thread by Simon (reasoning): https://twitter.com/NewDigitalEdu/status/1787403266894020893
- r/LocalLLaMa: https://www.reddit.com/r/LocalLLaMA/comments/1cl525q/goliath_lovers_where_is_the_feedback_about/
𧩠Configuration
slices:
- sources:
- layer_range: [0, 20]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [10, 30]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [20, 40]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [30, 50]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [40, 60]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [50, 70]
model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
- layer_range: [60, 80]
model: meta-llama/Meta-Llama-3-70B-Instruct
merge_method: passthrough
dtype: float16
π» Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/Llama-3-120B"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])