Update README.md
Browse files
README.md
CHANGED
@@ -6,56 +6,56 @@ library_name: transformers
|
|
6 |
---
|
7 |

|
8 |
|
9 |
-
|
|
|
|
|
10 |
|
11 |
## **Model Overview**
|
12 |
|
13 |
-
**Maverick** is a 7.68-billion-parameter causal language model fine-tuned from
|
14 |
|
15 |
## **Model Details**
|
16 |
|
17 |
- **Model Developer:** Aayan Mishra
|
18 |
- **Model Type:** Causal Language Model
|
19 |
-
- **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm,
|
20 |
-
- **Parameters:**
|
21 |
-
- **Layers:**
|
22 |
-
- **Attention Heads:**
|
23 |
- **Vocabulary Size:** Approximately 151,646 tokens
|
24 |
- **Context Length:** Supports up to 131,072 tokens
|
25 |
-
- **Languages Supported:** Over 29 languages,
|
26 |
- **License:** MIT
|
27 |
|
28 |
## **Training Details**
|
29 |
|
30 |
-
Maverick was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU
|
31 |
|
32 |
## **Intended Use**
|
33 |
|
34 |
-
Maverick is designed for a range of applications, including but not limited to:
|
35 |
|
36 |
-
- **STEM Reasoning:** Assisting with problem-solving and explanations
|
37 |
-
- **Academic Assistance:**
|
38 |
-
- **General NLP Tasks:**
|
39 |
-
- **Data Analysis:**
|
40 |
|
41 |
-
While Maverick is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.
|
42 |
|
43 |
## **How to Use**
|
44 |
|
45 |
-
To
|
46 |
|
47 |
```bash
|
48 |
pip install transformers
|
49 |
```
|
50 |
|
51 |
-
|
52 |
-
Here's an example of how to load the Maverick model and generate a response:
|
53 |
|
54 |
```python
|
55 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
56 |
|
57 |
model_name = "Spestly/Maverick-1-7B"
|
58 |
-
|
59 |
model = AutoModelForCausalLM.from_pretrained(
|
60 |
model_name,
|
61 |
torch_dtype="auto",
|
@@ -74,7 +74,6 @@ text = tokenizer.apply_chat_template(
|
|
74 |
add_generation_prompt=True
|
75 |
)
|
76 |
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
77 |
-
|
78 |
generated_ids = model.generate(
|
79 |
**model_inputs,
|
80 |
max_new_tokens=512
|
@@ -82,7 +81,6 @@ generated_ids = model.generate(
|
|
82 |
generated_ids = [
|
83 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
84 |
]
|
85 |
-
|
86 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
87 |
print(response)
|
88 |
```
|
@@ -91,18 +89,19 @@ print(response)
|
|
91 |
|
92 |
Users should be aware of the following limitations:
|
93 |
|
94 |
-
- **Biases:** Maverick may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
|
95 |
- **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
|
96 |
-
- **Language Support:** While
|
97 |
|
98 |
## **Acknowledgements**
|
99 |
|
100 |
-
Maverick builds upon the work of
|
101 |
|
102 |
## **License**
|
103 |
|
104 |
-
Maverick is released under the
|
105 |
|
106 |
## **Contact**
|
107 |
|
108 |
-
- Email: [email protected]
|
|
|
|
6 |
---
|
7 |

|
8 |
|
9 |
+

|
10 |
+
|
11 |
+
# **Maverick-1-7B Model Card**
|
12 |
|
13 |
## **Model Overview**
|
14 |
|
15 |
+
**Maverick-1-7B** is a 7.68-billion-parameter causal language model fine-tuned from Qwen2.5-Math-7B. This model is designed to excel in STEM reasoning, mathematics, and natural language processing tasks, offering advanced instruction-following and problem-solving capabilities.
|
16 |
|
17 |
## **Model Details**
|
18 |
|
19 |
- **Model Developer:** Aayan Mishra
|
20 |
- **Model Type:** Causal Language Model
|
21 |
+
- **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
|
22 |
+
- **Parameters:** 7.68 billion total (6.93 billion non-embedding)
|
23 |
+
- **Layers:** 32
|
24 |
+
- **Attention Heads:** 24 for query and 4 for key-value (Grouped Query Attention)
|
25 |
- **Vocabulary Size:** Approximately 151,646 tokens
|
26 |
- **Context Length:** Supports up to 131,072 tokens
|
27 |
+
- **Languages Supported:** Over 29 languages, with strong emphasis on English and mathematical expressions
|
28 |
- **License:** MIT
|
29 |
|
30 |
## **Training Details**
|
31 |
|
32 |
+
Maverick-1-7B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated dataset focused on instruction-following, problem-solving, and advanced mathematics. This approach enhances the model’s capabilities in academic and analytical tasks.
|
33 |
|
34 |
## **Intended Use**
|
35 |
|
36 |
+
Maverick-1-7B is designed for a range of applications, including but not limited to:
|
37 |
|
38 |
+
- **STEM Reasoning:** Assisting with complex problem-solving and theoretical explanations.
|
39 |
+
- **Academic Assistance:** Supporting tutoring, step-by-step math solutions, and scientific writing.
|
40 |
+
- **General NLP Tasks:** Text generation, summarization, and question answering.
|
41 |
+
- **Data Analysis:** Interpreting and explaining mathematical and statistical data.
|
42 |
|
43 |
+
While Maverick-1-7B is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.
|
44 |
|
45 |
## **How to Use**
|
46 |
|
47 |
+
To utilize Maverick-1-7B, ensure that you have the latest version of the `transformers` library installed:
|
48 |
|
49 |
```bash
|
50 |
pip install transformers
|
51 |
```
|
52 |
|
53 |
+
Here's an example of how to load the Maverick-1-7B model and generate a response:
|
|
|
54 |
|
55 |
```python
|
56 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
57 |
|
58 |
model_name = "Spestly/Maverick-1-7B"
|
|
|
59 |
model = AutoModelForCausalLM.from_pretrained(
|
60 |
model_name,
|
61 |
torch_dtype="auto",
|
|
|
74 |
add_generation_prompt=True
|
75 |
)
|
76 |
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|
|
77 |
generated_ids = model.generate(
|
78 |
**model_inputs,
|
79 |
max_new_tokens=512
|
|
|
81 |
generated_ids = [
|
82 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
83 |
]
|
|
|
84 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
85 |
print(response)
|
86 |
```
|
|
|
89 |
|
90 |
Users should be aware of the following limitations:
|
91 |
|
92 |
+
- **Biases:** Maverick-1-7B may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
|
93 |
- **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
|
94 |
+
- **Language Support:** While the model supports multiple languages, performance is strongest in English and technical content.
|
95 |
|
96 |
## **Acknowledgements**
|
97 |
|
98 |
+
Maverick-1-7B builds upon the work of the Qwen team. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Maverick-1-7B.
|
99 |
|
100 |
## **License**
|
101 |
|
102 |
+
Maverick-1-7B is released under the MIT License, permitting wide usage with proper attribution.
|
103 |
|
104 |
## **Contact**
|
105 |
|
106 |
+
- Email: [email protected]
|
107 |
+
|