Files changed (1) hide show
  1. README.md +145 -162
README.md CHANGED
@@ -1,163 +1,146 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-14B-Instruct
4
- license: mit
5
- language:
6
- - en
7
- - zh
8
- - fr
9
- - es
10
- - pt
11
- - de
12
- - it
13
- - ru
14
- - ja
15
- - ko
16
- - vi
17
- - th
18
- - ar
19
- - fa
20
- - he
21
- - tr
22
- - cs
23
- - pl
24
- - hi
25
- - bn
26
- - ur
27
- - id
28
- - ms
29
- - lo
30
- - my
31
- - ceb
32
- - km
33
- - tl
34
- - nl
35
- tags:
36
- - chemistry
37
- - biology
38
- - code
39
- - text-generation-inference
40
- - STEM
41
- - unsloth
42
- - text-generation-inference
43
- - transformers
44
- - qwen2
45
- - trl
46
- ---
47
- <div align="center">
48
- <span style="font-family: default; font-size: 1.5em;">Athena-3</span>
49
- <div>
50
- 🚀 Faster, Sharper, Smarter than Athena 1 and Athena 2🌟
51
- </div>
52
- </div>
53
- <br>
54
- <div align="center" style="line-height: 1;">
55
- <a href="https://github.com/Aayan-Mishra/Maverick-Search" style="margin: 2px;">
56
- <img alt="Github Page" src="https://img.shields.io/badge/Toolkit-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
57
- </a>
58
- <a href="https://aayanmishra.com/blog/athena-3" target="_blank" style="margin: 2px;">
59
- <img alt="Blogpost" src="https://img.shields.io/badge/Blogpost-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
60
- </a>
61
- <a href="https://huggingface.co/Spestly/Athena-3-14B" style="margin: 2px;">
62
- <img alt="HF Page" src="https://img.shields.io/badge/Athena-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
63
- </a>
64
- </div>
65
-
66
- ## **Athena-3**
67
-
68
- *Athena generated this model card!*
69
-
70
- **Athena-3-14B** is a 14.0-billion-parameter causal language model fine-tuned from Qwen2.5-14B-Instruct. This model is designed to provide highly fluent, contextually aware, and logically sound outputs across a broad range of NLP and reasoning tasks. It balances instruction-following with generative flexibility.
71
-
72
- ## **Model Details**
73
-
74
- - **Model Developer:** Aayan Mishra
75
- - **Model Type:** Causal Language Model
76
- - **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
77
- - **Parameters:** 14.0 billion total (12.84 billion non-embedding)
78
- - **Layers:** 40
79
- - **Attention Heads:** 40 for query and 4 for key-value (Grouped Query Attention)
80
- - **Vocabulary Size:** Approximately 151,646 tokens
81
- - **Context Length:** Supports up to 131,072 tokens
82
- - **Languages Supported:** Over 29 languages, including strong performance in English, Chinese, and multilingual instruction tasks
83
- - **License:** MIT
84
-
85
- ## **Training Details**
86
-
87
- Athena-3-14B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated instruction-tuned dataset. It is tailored for generalist NLP performance with a focus on reasoning, alignment, and fluency.
88
-
89
- ## **Intended Use**
90
-
91
- Athena-3-14B is ideal for a wide variety of tasks, including:
92
-
93
- - **Instruction Following:** Handling complex prompts with step-by-step logical output
94
- - **Writing Assistance:** Generating essays, emails, and coherent narratives
95
- - **NLP Tasks:** Summarization, question answering, translation, and text classification
96
- - **STEM Support:** Reasoning through academic and technical content
97
-
98
- While Athena-3-14B is a versatile model, it is not intended for safety-critical applications or the handling of private, sensitive information.
99
-
100
- ## **How to Use**
101
-
102
- To utilize Athena-3-14B, ensure that you have the latest version of the `transformers` library installed:
103
-
104
- ```bash
105
- pip install transformers
106
- ```
107
-
108
- Here's an example of how to load the Athena-3-14B model and generate a response:
109
-
110
- ```python
111
- from transformers import AutoModelForCausalLM, AutoTokenizer
112
- model_name = "Spestly/Athena-3-14B"
113
- model = AutoModelForCausalLM.from_pretrained(
114
- model_name,
115
- torch_dtype="auto",
116
- device_map="auto"
117
- )
118
- tokenizer = AutoTokenizer.from_pretrained(model_name)
119
- prompt = "Explain the concept of entropy in thermodynamics."
120
- messages = [
121
- {"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
122
- {"role": "user", "content": prompt}
123
- ]
124
- text = tokenizer.apply_chat_template(
125
- messages,
126
- tokenize=False,
127
- add_generation_prompt=True
128
- )
129
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
130
- generated_ids = model.generate(
131
- **model_inputs,
132
- max_new_tokens=512
133
- )
134
- generated_ids = [
135
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
136
- ]
137
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
138
- print(response)
139
- ```
140
-
141
- ### **Maverick Search usage 🔍**
142
-
143
- To use this model with Maverick Search, please refer to this [repository](https://github.com/Aayan-Mishra/Maverick-Search)
144
-
145
- ## **Limitations**
146
-
147
- Users should be aware of the following limitations:
148
-
149
- - **Biases:** Athena-3-14B may reflect biases from its pretraining and fine-tuning data. Outputs should be reviewed for fairness and accuracy.
150
- - **Knowledge Cutoff:** The model's knowledge is current as of August 2024.
151
- - **Multilingual Performance:** Performance varies by language, with strongest capabilities in English and aligned datasets.
152
-
153
- ## **Acknowledgements**
154
-
155
- Athena-3-14B builds upon the Qwen2.5-14B foundation. Special thanks to the open-source ecosystem and Unsloth for enabling efficient fine-tuning workflows.
156
-
157
- ## **License**
158
-
159
- Athena-3-14B is released under the MIT License, permitting broad use and distribution with proper attribution.
160
-
161
- ## **Contact**
162
-
163
  - Email: [email protected]
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-14B-Instruct
4
+ license: mit
5
+ language:
6
+ - zho
7
+ - eng
8
+ - fra
9
+ - spa
10
+ - por
11
+ - deu
12
+ - ita
13
+ - rus
14
+ - jpn
15
+ - kor
16
+ - vie
17
+ - tha
18
+ - ara
19
+ tags:
20
+ - chemistry
21
+ - biology
22
+ - code
23
+ - text-generation-inference
24
+ - STEM
25
+ - unsloth
26
+ - transformers
27
+ - qwen2
28
+ - trl
29
+ ---
30
+ <div align="center">
31
+ <span style="font-family: default; font-size: 1.5em;">Athena-3</span>
32
+ <div>
33
+ 🚀 Faster, Sharper, Smarter than Athena 1 and Athena 2🌟
34
+ </div>
35
+ </div>
36
+ <br>
37
+ <div align="center" style="line-height: 1;">
38
+ <a href="https://github.com/Aayan-Mishra/Maverick-Search" style="margin: 2px;">
39
+ <img alt="Github Page" src="https://img.shields.io/badge/Toolkit-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
40
+ </a>
41
+ <a href="https://aayanmishra.com/blog/athena-3" target="_blank" style="margin: 2px;">
42
+ <img alt="Blogpost" src="https://img.shields.io/badge/Blogpost-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
43
+ </a>
44
+ <a href="https://huggingface.co/Spestly/Athena-3-14B" style="margin: 2px;">
45
+ <img alt="HF Page" src="https://img.shields.io/badge/Athena-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/>
46
+ </a>
47
+ </div>
48
+
49
+ ## **Athena-3**
50
+
51
+ *Athena generated this model card!*
52
+
53
+ **Athena-3-14B** is a 14.0-billion-parameter causal language model fine-tuned from Qwen2.5-14B-Instruct. This model is designed to provide highly fluent, contextually aware, and logically sound outputs across a broad range of NLP and reasoning tasks. It balances instruction-following with generative flexibility.
54
+
55
+ ## **Model Details**
56
+
57
+ - **Model Developer:** Aayan Mishra
58
+ - **Model Type:** Causal Language Model
59
+ - **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
60
+ - **Parameters:** 14.0 billion total (12.84 billion non-embedding)
61
+ - **Layers:** 40
62
+ - **Attention Heads:** 40 for query and 4 for key-value (Grouped Query Attention)
63
+ - **Vocabulary Size:** Approximately 151,646 tokens
64
+ - **Context Length:** Supports up to 131,072 tokens
65
+ - **Languages Supported:** Over 29 languages, including strong performance in English, Chinese, and multilingual instruction tasks
66
+ - **License:** MIT
67
+
68
+ ## **Training Details**
69
+
70
+ Athena-3-14B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated instruction-tuned dataset. It is tailored for generalist NLP performance with a focus on reasoning, alignment, and fluency.
71
+
72
+ ## **Intended Use**
73
+
74
+ Athena-3-14B is ideal for a wide variety of tasks, including:
75
+
76
+ - **Instruction Following:** Handling complex prompts with step-by-step logical output
77
+ - **Writing Assistance:** Generating essays, emails, and coherent narratives
78
+ - **NLP Tasks:** Summarization, question answering, translation, and text classification
79
+ - **STEM Support:** Reasoning through academic and technical content
80
+
81
+ While Athena-3-14B is a versatile model, it is not intended for safety-critical applications or the handling of private, sensitive information.
82
+
83
+ ## **How to Use**
84
+
85
+ To utilize Athena-3-14B, ensure that you have the latest version of the `transformers` library installed:
86
+
87
+ ```bash
88
+ pip install transformers
89
+ ```
90
+
91
+ Here's an example of how to load the Athena-3-14B model and generate a response:
92
+
93
+ ```python
94
+ from transformers import AutoModelForCausalLM, AutoTokenizer
95
+ model_name = "Spestly/Athena-3-14B"
96
+ model = AutoModelForCausalLM.from_pretrained(
97
+ model_name,
98
+ torch_dtype="auto",
99
+ device_map="auto"
100
+ )
101
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
102
+ prompt = "Explain the concept of entropy in thermodynamics."
103
+ messages = [
104
+ {"role": "system", "content": "You are Maverick, an AI assistant designed to be helpful."},
105
+ {"role": "user", "content": prompt}
106
+ ]
107
+ text = tokenizer.apply_chat_template(
108
+ messages,
109
+ tokenize=False,
110
+ add_generation_prompt=True
111
+ )
112
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
113
+ generated_ids = model.generate(
114
+ **model_inputs,
115
+ max_new_tokens=512
116
+ )
117
+ generated_ids = [
118
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
119
+ ]
120
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
121
+ print(response)
122
+ ```
123
+
124
+ ### **Maverick Search usage 🔍**
125
+
126
+ To use this model with Maverick Search, please refer to this [repository](https://github.com/Aayan-Mishra/Maverick-Search)
127
+
128
+ ## **Limitations**
129
+
130
+ Users should be aware of the following limitations:
131
+
132
+ - **Biases:** Athena-3-14B may reflect biases from its pretraining and fine-tuning data. Outputs should be reviewed for fairness and accuracy.
133
+ - **Knowledge Cutoff:** The model's knowledge is current as of August 2024.
134
+ - **Multilingual Performance:** Performance varies by language, with strongest capabilities in English and aligned datasets.
135
+
136
+ ## **Acknowledgements**
137
+
138
+ Athena-3-14B builds upon the Qwen2.5-14B foundation. Special thanks to the open-source ecosystem and Unsloth for enabling efficient fine-tuning workflows.
139
+
140
+ ## **License**
141
+
142
+ Athena-3-14B is released under the MIT License, permitting broad use and distribution with proper attribution.
143
+
144
+ ## **Contact**
145
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
  - Email: [email protected]