Spestly commited on
Commit
d458083
·
verified ·
1 Parent(s): efeeb21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -25
README.md CHANGED
@@ -6,56 +6,56 @@ library_name: transformers
6
  ---
7
  ![Header](Maverick.png)
8
 
9
- # **Maverick Model Card**
 
 
10
 
11
  ## **Model Overview**
12
 
13
- **Maverick** is a 7.68-billion-parameter causal language model fine-tuned from [Ruben Roy's Zurich-7B-GCv2-5m](https://huggingface.co/rubenroy/Zurich-7B-GCv2-5m). The base model, Zurich-7B-GCv2-5m, is itself a fine-tuned version of Alibaba's Qwen 2.5 7B Instruct model, trained on the GammaCorpus v2-5m dataset. Maverick is designed to excel in various STEM fields and general natural language processing tasks, offering enhanced reasoning and instruction-following capabilities.
14
 
15
  ## **Model Details**
16
 
17
  - **Model Developer:** Aayan Mishra
18
  - **Model Type:** Causal Language Model
19
- - **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, and Attention QKV bias
20
- - **Parameters:** 14.7 billion total (13.1 billion non-embedding)
21
- - **Layers:** 28
22
- - **Attention Heads:** 28 for query and 4 for key-value (Grouped Query Attention)
23
  - **Vocabulary Size:** Approximately 151,646 tokens
24
  - **Context Length:** Supports up to 131,072 tokens
25
- - **Languages Supported:** Over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic
26
  - **License:** MIT
27
 
28
  ## **Training Details**
29
 
30
- Maverick was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU (Lambda Labs). The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilising a curated dataset focused on instruction-following and STEM-related content. This approach aimed to enhance the model's performance in complex reasoning and academic tasks.
31
 
32
  ## **Intended Use**
33
 
34
- Maverick is designed for a range of applications, including but not limited to:
35
 
36
- - **STEM Reasoning:** Assisting with problem-solving and explanations in science, technology, engineering, and mathematics.
37
- - **Academic Assistance:** Providing support for tutoring, essay composition, and research inquiries.
38
- - **General NLP Tasks:** Engaging in text completion, summarisation, and question-answering tasks.
39
- - **Data Analysis:** Offering insights and interpretations of data-centric queries.
40
 
41
- While Maverick is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.
42
 
43
  ## **How to Use**
44
 
45
- To utilise Maverick, ensure that you have the latest version of the `transformers` library installed:
46
 
47
  ```bash
48
  pip install transformers
49
  ```
50
 
51
-
52
- Here's an example of how to load the Maverick model and generate a response:
53
 
54
  ```python
55
  from transformers import AutoModelForCausalLM, AutoTokenizer
56
 
57
  model_name = "Spestly/Maverick-1-7B"
58
-
59
  model = AutoModelForCausalLM.from_pretrained(
60
  model_name,
61
  torch_dtype="auto",
@@ -74,7 +74,6 @@ text = tokenizer.apply_chat_template(
74
  add_generation_prompt=True
75
  )
76
  model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
77
-
78
  generated_ids = model.generate(
79
  **model_inputs,
80
  max_new_tokens=512
@@ -82,7 +81,6 @@ generated_ids = model.generate(
82
  generated_ids = [
83
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
84
  ]
85
-
86
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
87
  print(response)
88
  ```
@@ -91,18 +89,19 @@ print(response)
91
 
92
  Users should be aware of the following limitations:
93
 
94
- - **Biases:** Maverick may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
95
  - **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
96
- - **Language Support:** While primarily trained on English data, performance in other languages may be inconsistent.
97
 
98
  ## **Acknowledgements**
99
 
100
- Maverick builds upon the work of [Ruben Roy](https://huggingface.co/rubenroy), particularly the Zurich-14B-GCv2-5m model, which is a fine-tuned version of Alibaba's Qwen 2.5 14B Instruct model. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Maverick.
101
 
102
  ## **License**
103
 
104
- Maverick is released under the [MIT License](https://opensource.org/license/mit), permitting wide usage with proper attribution.
105
 
106
  ## **Contact**
107
 
108
- - Email: [email protected]
 
 
6
  ---
7
  ![Header](Maverick.png)
8
 
9
+ ![Header](Maverick.png)
10
+
11
+ # **Maverick-1-7B Model Card**
12
 
13
  ## **Model Overview**
14
 
15
+ **Maverick-1-7B** is a 7.68-billion-parameter causal language model fine-tuned from Qwen2.5-Math-7B. This model is designed to excel in STEM reasoning, mathematics, and natural language processing tasks, offering advanced instruction-following and problem-solving capabilities.
16
 
17
  ## **Model Details**
18
 
19
  - **Model Developer:** Aayan Mishra
20
  - **Model Type:** Causal Language Model
21
+ - **Architecture:** Transformer with Rotary Position Embeddings (RoPE), SwiGLU activation, RMSNorm, Attention QKV bias, and tied word embeddings
22
+ - **Parameters:** 7.68 billion total (6.93 billion non-embedding)
23
+ - **Layers:** 32
24
+ - **Attention Heads:** 24 for query and 4 for key-value (Grouped Query Attention)
25
  - **Vocabulary Size:** Approximately 151,646 tokens
26
  - **Context Length:** Supports up to 131,072 tokens
27
+ - **Languages Supported:** Over 29 languages, with strong emphasis on English and mathematical expressions
28
  - **License:** MIT
29
 
30
  ## **Training Details**
31
 
32
+ Maverick-1-7B was fine-tuned using the Unsloth framework on a single NVIDIA A100 GPU. The fine-tuning process spanned approximately 90 minutes over 60 epochs, utilizing a curated dataset focused on instruction-following, problem-solving, and advanced mathematics. This approach enhances the models capabilities in academic and analytical tasks.
33
 
34
  ## **Intended Use**
35
 
36
+ Maverick-1-7B is designed for a range of applications, including but not limited to:
37
 
38
+ - **STEM Reasoning:** Assisting with complex problem-solving and theoretical explanations.
39
+ - **Academic Assistance:** Supporting tutoring, step-by-step math solutions, and scientific writing.
40
+ - **General NLP Tasks:** Text generation, summarization, and question answering.
41
+ - **Data Analysis:** Interpreting and explaining mathematical and statistical data.
42
 
43
+ While Maverick-1-7B is a powerful tool for various applications, it is not intended for real-time, safety-critical systems or for processing sensitive personal information.
44
 
45
  ## **How to Use**
46
 
47
+ To utilize Maverick-1-7B, ensure that you have the latest version of the `transformers` library installed:
48
 
49
  ```bash
50
  pip install transformers
51
  ```
52
 
53
+ Here's an example of how to load the Maverick-1-7B model and generate a response:
 
54
 
55
  ```python
56
  from transformers import AutoModelForCausalLM, AutoTokenizer
57
 
58
  model_name = "Spestly/Maverick-1-7B"
 
59
  model = AutoModelForCausalLM.from_pretrained(
60
  model_name,
61
  torch_dtype="auto",
 
74
  add_generation_prompt=True
75
  )
76
  model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 
77
  generated_ids = model.generate(
78
  **model_inputs,
79
  max_new_tokens=512
 
81
  generated_ids = [
82
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
83
  ]
 
84
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
85
  print(response)
86
  ```
 
89
 
90
  Users should be aware of the following limitations:
91
 
92
+ - **Biases:** Maverick-1-7B may exhibit biases present in its training data. Users should critically assess outputs, especially in sensitive contexts.
93
  - **Knowledge Cutoff:** The model's knowledge is current up to August 2024. It may not be aware of events or developments occurring after this date.
94
+ - **Language Support:** While the model supports multiple languages, performance is strongest in English and technical content.
95
 
96
  ## **Acknowledgements**
97
 
98
+ Maverick-1-7B builds upon the work of the Qwen team. Gratitude is also extended to the open-source AI community for their contributions to tools and frameworks that facilitated the development of Maverick-1-7B.
99
 
100
  ## **License**
101
 
102
+ Maverick-1-7B is released under the MIT License, permitting wide usage with proper attribution.
103
 
104
  ## **Contact**
105
 
106
+ - Email: [email protected]
107
+