|
--- |
|
language: |
|
- en |
|
- de |
|
- fr |
|
- it |
|
- pt |
|
- hi |
|
- es |
|
- th |
|
pipeline_tag: text-generation |
|
tags: |
|
- llama-3.1 |
|
- conversational |
|
- instruction following |
|
- reasoning |
|
- function calling |
|
license: llama3.1 |
|
base_model: akjindal53244/Llama-3.1-Storm-8B |
|
--- |
|
|
|
 |
|
|
|
Authors: [Ashvini Kumar Jindal](https://www.linkedin.com/in/ashvini-jindal-26653262/), [Pawan Kumar Rajpoot](https://www.linkedin.com/in/pawanrajpoot/), [Ankur Parikh](https://www.linkedin.com/in/ankurnlpexpert/), [Akshita Sukhlecha](https://www.linkedin.com/in/akshita-sukhlecha/) |
|
|
|
**π€ Hugging Face Announcement Blog**: https://huggingface.co/blog/akjindal53244/llama31-storm8b |
|
|
|
**πOllama:** `ollama run ajindal/llama3.1-storm:8b` |
|
|
|
<br> |
|
|
|
# Llama-3.1-Storm-8B-GGUF |
|
**This is the GGUF quantized version of [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B), for use with [llama.cpp](https://github.com/ggerganov/llama.cpp). BF16 Model [here](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B)** |
|
|
|
## TL;DR |
|
 |
|
|
|
We present the [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model that outperforms Meta AI's [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) and [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) models significantly across diverse benchmarks as shown in the performance comparison plot in the next section. Our approach consists of three key steps: |
|
1. **Self-Curation**: We applied two self-curation methods to select approximately 1 million high-quality examples from a pool of ~2.8 million open-source examples. **Our curation criteria focused on educational value and difficulty level, using the same SLM for annotation instead of larger models (e.g. 70B, 405B).** |
|
2. **Targeted fine-tuning**: We performed [Spectrum](https://arxiv.org/abs/2406.06623)-based targeted fine-tuning over the Llama-3.1-8B-Instruct model. The Spectrum method accelerates training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. In our work, 50% of layers are frozen. |
|
3. **Model Merging**: We merged our fine-tuned model with the [Llama-Spark](https://huggingface.co/arcee-ai/Llama-Spark) model using [SLERP](https://huggingface.co/blog/mlabonne/merge-models#1-slerp) method. The merging method produces a blended model with characteristics smoothly interpolated from both parent models, ensuring the resultant model captures the essence of both its parents. [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) improves Llama-3.1-8B-Instruct across 10 diverse benchmarks. These benchmarks cover areas such as instruction-following, knowledge-driven QA, reasoning, truthful answer generation, and function calling. |
|
|
|
## π Introducing Llama-3.1-Storm-8B |
|
[**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) builds upon the foundation of Llama-3.1-8B-Instruct, aiming to enhance both conversational and function calling capabilities within the 8B parameter model class. |
|
|
|
As shown in the left subplot of the above figure, [**Llama-3.1-Storm-8B**](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) model improves Meta-Llama-3.1-8B-Instruct across various benchmarks - Instruction-following ([IFEval](https://arxiv.org/abs/2311.07911)), Knowledge-driven QA benchmarks ([GPQA](https://arxiv.org/abs/2311.12022), [MMLU-Pro](https://arxiv.org/pdf/2406.01574)), Reasoning ([ARC-C](https://arxiv.org/abs/1803.05457), [MuSR](https://arxiv.org/abs/2310.16049), [BBH](https://arxiv.org/pdf/2210.09261)), Reduced Hallucinations ([TruthfulQA](https://arxiv.org/abs/2109.07958)), and Function-Calling ([BFCL](https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard)). This improvement is particularly significant for AI developers and enthusiasts who work with limited computational resources. |
|
|
|
We also benchmarked our model with the recently published model [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) built on top of the Llama-3.1-8B-Instruct model. As shown in the right subplot of the above figure, **Llama-3.1-Storm-8B outperforms Hermes-3-Llama-3.1-8B on 7 out of 9 benchmarks**, with Hermes-3-Llama-3.1-8B surpassing Llama-3.1-Storm-8B on the MuSR benchmark and both models showing comparable performance on the BBH benchmark. |
|
|
|
|
|
## Llama-3.1-Storm-8B Model Strengths |
|
Llama-3.1-Storm-8B is a powerful generalist model useful for diverse applications. We invite the AI community to explore [Llama-3.1-Storm-8B](https://huggingface.co/collections/akjindal53244/storm-66ba6c96b7e24ecb592787a9) and look forward to seeing how it will be utilized in various projects and applications. |
|
|
|
<table> |
|
<tr> |
|
<td><strong>Model Strength</strong> |
|
</td> |
|
<td><strong>Relevant Benchmarks</strong> |
|
</td> |
|
<tr> |
|
<tr> |
|
<td>π― Improved Instruction Following |
|
</td> |
|
<td>IFEval Strict (+3.93%) |
|
</td> |
|
<tr> |
|
<tr> |
|
<td>π Enhanced Knowledge Driven Question Answering |
|
</td> |
|
<td>GPQA (+7.21%), MMLU-Pro (+0.55%), AGIEval (+3.77%) |
|
</td> |
|
<tr> |
|
<tr> |
|
<td>π§ Better Reasoning |
|
</td> |
|
<td>ARC-C (+3.92%), MuSR (+2.77%), BBH (+1.67%), AGIEval (+3.77%) |
|
</td> |
|
<tr> |
|
<tr> |
|
<td>π€ Superior Agentic Capabilities |
|
</td> |
|
<td>BFCL: Overall Acc (+7.92%), BFCL: AST Summary (+12.32%) |
|
</td> |
|
<tr> |
|
<tr> |
|
<td>π« Reduced Hallucinations |
|
</td> |
|
<td>TruthfulQA (+9%) |
|
</td> |
|
<tr> |
|
</table> |
|
|
|
**Note**: All improvements are absolute gains over Meta-Llama-3.1-8B-Instruct. |
|
|
|
|
|
## Llama-3.1-Storm-8B Models |
|
1. `BF16`: [Llama-3.1-Storm-8B](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B) |
|
2. β‘ `FP8`: [Llama-3.1-Storm-8B-FP8-Dynamic](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B-FP8-Dynamic) |
|
3. β‘ `GGUF`: [Llama-3.1-Storm-8B-GGUF](https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B-GGUF) |
|
4. π Ollama: `ollama run ajindal/llama3.1-storm:8b` |
|
|
|
## π» How to Use GGUF Model |
|
|
|
```bash |
|
pip install llama-cpp-python |
|
``` |
|
|
|
```python |
|
from huggingface_hub import hf_hub_download |
|
from llama_cpp import Llama |
|
|
|
## Download the GGUF model |
|
model_name = "akjindal53244/Llama-3.1-Storm-8B-GGUF" |
|
model_file = "Llama-3.1-Storm-8B.Q8_0.gguf" # this is the specific model file we'll use in this example. It's a 4-bit quant, but other levels of quantization are available in the model repo if preferred |
|
model_path = hf_hub_download(model_name, filename=model_file) |
|
|
|
## Instantiate model from downloaded file |
|
llm = Llama( |
|
model_path=model_path, |
|
n_ctx=16000, # Context length to use |
|
n_threads=32, # Number of CPU threads to use |
|
n_gpu_layers=0 # Number of model layers to offload to GPU |
|
) |
|
|
|
generation_kwargs = { |
|
"max_tokens":200, |
|
"stop":["<|eot_id|>"], |
|
"echo":False, # Echo the prompt in the output |
|
"top_k":1 # Set this value > 1 for sampling decoding |
|
} |
|
|
|
prompt = "What is 2+2?" |
|
res = llm(prompt, **generation_kwargs) |
|
print(res["choices"][0]["text"]) |
|
``` |
|
|
|
### Function Calling Example with [Ollama](https://ollama.com/) |
|
``` |
|
import ollama |
|
tools = [{ |
|
'type': 'function', |
|
'function': { |
|
'name': 'get_current_weather', |
|
'description': 'Get the current weather for a city', |
|
'parameters': { |
|
'type': 'object', |
|
'properties': { |
|
'city': { |
|
'type': 'string', |
|
'description': 'The name of the city', |
|
}, |
|
}, |
|
'required': ['city'], |
|
}, |
|
}, |
|
}, |
|
{ |
|
'type': 'function', |
|
'function': { |
|
'name': 'get_places_to_vist', |
|
'description': 'Get places to visit in a city', |
|
'parameters': { |
|
'type': 'object', |
|
'properties': { |
|
'city': { |
|
'type': 'string', |
|
'description': 'The name of the city', |
|
}, |
|
}, |
|
'required': ['city'], |
|
}, |
|
}, |
|
}, |
|
] |
|
response = ollama.chat( |
|
model='ajindal/llama3.1-storm:8b', |
|
messages=[ |
|
{'role': 'system', 'content': 'Do not answer to nay vulgar questions.'}, |
|
{'role': 'user', 'content': 'What is the weather in Toronto and San Francisco?'} |
|
], |
|
tools=tools |
|
) |
|
print(response['message']) # Expected Response: {'role': 'assistant', 'content': "<tool_call>{'tool_name': 'get_current_weather', 'tool_arguments': {'city': 'Toronto'}}</tool_call>"} |
|
``` |
|
|
|
|
|
## Alignment Note |
|
While **Llama-3.1-Storm-8B** did not undergo an explicit model alignment process, it may still retain some alignment properties inherited from the Meta-Llama-3.1-8B-Instruct model. |
|
|
|
## Cite Our Work |
|
``` |
|
@misc {ashvini_kumar_jindal_2024, |
|
author = { {Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh, Akshita Sukhlecha} }, |
|
title = { Llama-3.1-Storm-8B }, |
|
year = 2024, |
|
url = { https://huggingface.co/akjindal53244/Llama-3.1-Storm-8B }, |
|
doi = { 10.57967/hf/2902 }, |
|
publisher = { Hugging Face } |
|
} |
|
``` |
|
|
|
## Support Our Work |
|
With 3 team-members spanned across 3 different time-zones, we have won [NeurIPS LLM Efficiency Challenge 2023](https://llm-efficiency-challenge.github.io/) and 4 other competitions in Finance and Arabic LLM space. We have also published [SOTA mathematical reasoning model](https://huggingface.co/akjindal53244/Arithmo-Mistral-7B). |
|
|
|
**Llama-3.1-Storm-8B** is our most valuable contribution so far towards the open-source community. We are committed in developing efficient generalist LLMs. **We're seeking both computational resources and innovative collaborators to drive this initiative forward.** |