File size: 5,658 Bytes
e13ff6c efeeb21 e13ff6c 9e2e686 a782ef2 a152ace 5cc6cd6 23728a1 6e412a0 23728a1 bd0c50b 4bcb7c4 a782ef2 23728a1 a782ef2 4bcb7c4 a782ef2 d458083 a782ef2 d458083 a782ef2 4bcb7c4 a782ef2 23728a1 a782ef2 4bcb7c4 a782ef2 23728a1 a782ef2 d458083 a782ef2 23728a1 a782ef2 4bcb7c4 a782ef2 23728a1 a782ef2 23728a1 a782ef2 23728a1 a782ef2 cfe72dc a782ef2 4bcb7c4 a782ef2 23728a1 a782ef2 d458083 a782ef2 4bcb7c4 a782ef2 23728a1 a782ef2 4bcb7c4 a782ef2 23728a1 4bcb7c4 23728a1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
---
license: mit
base_model:
- Qwen/Qwen2.5-Math-7B
library_name: transformers
language:
- en
- zh
- fr
- es
- pt
- de
- it
- ru
- ja
- ko
- vi
- th
- ar
- fa
- he
- tr
- cs
- pl
- hi
- bn
- ur
- id
- ms
- lo
- my
- ceb
- km
- tl
- nl
tags:
- chemistry
- biology
- code
- text-generation-inference
- STEM
- unsloth
---
<div align="center">
<span style="font-family: default; font-size: 1.5em;">Athena-3</span>
<div>
π Faster, Sharper, Smarter than Athena 1 and Athena 2π
</div>
</div>
<br>
<div align="center" style="line-height: 1;">
<a href="https://github.com/Aayan-Mishra/Maverick-Search" style="margin: 2px;">
<img alt="Github Page" src="https://img.shields.io/badge/Toolkit-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://aayanmishra.com/blog/athena-3" target="_blank" style="margin: 2px;">
<img alt="Blogpost" src="https://img.shields.io/badge/Blogpost-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://huggingface.co/Spestly/Athena-3-7B" style="margin: 2px;">
<img alt="HF Page" src="https://img.shields.io/badge/Athena-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
# **Athena-3-7B Model Card**
*Athena generated this model card!*
## **Model Overview**
**Athena-3-7B** is a 7.68-billion-parameter causal language model fine-tuned from Qwen2.5-Math-7B. This model is designed to excel in STEM reasoning, mathematics, and natural language processing tasks, offering advanced instruction-following and problem-solving capabilities.
## **Model Details**
- **Model Developer:** Aayan Mishra
- **Model Type:** Causal Language Model
- **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
- **Parameters:** 7.68 billion total (6.93 billion non-embedding)
- **Layers:** 32
- **Attention Heads:** 24 for query and 4 for key-value (Grouped Query Attention)
- **Vocabulary Size:** Approximately 151,646 tokens
- **Context Length:** Supports up to 131,072 tokens
- **Languages Supported:** Over 29 languages, with strong emphasis on English and mathematical expressions
- **License:** MIT
## **Training Details**
Athena-3-7B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated dataset focused on instruction-following, problem-solving, and advanced mathematics. This approach enhances the model's capabilities in academic and analytical tasks.
## **Intended Use**
Athena-3-7B is designed for a range of applications, including but not limited to:
- **STEM Reasoning:** Assisting with complex problem-solving and theoretical explanations.
- **Academic Assistance:** Supporting tutoring, step-by-step math solutions, and scientific writing.
- **General NLP Tasks:** Text generation, summarization, and question answering.
- **Data Analysis:** Interpreting and explaining mathematical and statistical data.
While Athena-3-7B is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.
## **How to Use**
To utilize Athena-3-7B, ensure that you have the latest version of the `transformers` library installed:
```bash
pip install transformers
```
Here's an example of how to load the Athena-3-7B model and generate a response:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Spestly/Athena-3-7B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain the concept of entropy in thermodynamics."
messages = [
{"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
### **Maverick Search usage π**
To use this model with Maverick Search, please refer to this [repository](https://github.com/Aayan-Mishra/Maverick-Search)
## **Limitations**
Users should be aware of the following limitations:
- **Biases:** Athena-3-7B may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
- **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
- **Language Support:** While the model supports multiple languages, performance is strongest in English and technical content.
## **Acknowledgements**
Athena-3-7B builds upon the work of the Qwen team. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Athena-3-7B.
## **License**
Athena-3-7B is released under the MIT License, permitting wide usage with proper attribution.
## **Contact**
- Email: [email protected] |