lvkaokao bconsolvo commited on
Commit
880597f
1 Parent(s): b80b062

Comprehensive overhaul of README.md for better documentation of the model (#4)

Browse files

- Comprehensive overhaul of README.md for better documentation of the model (a9ae754cf7ede1a017dc19ba4765a97ba5e7d934)


Co-authored-by: Benjamin Consolvo <[email protected]>

Files changed (1) hide show
  1. README.md +129 -40
README.md CHANGED
@@ -1,35 +1,75 @@
1
  ---
2
  license: apache-2.0
3
- ---
4
-
5
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
6
- should probably proofread and complete it, then remove this comment. -->
7
-
8
- This model is a fine-tuned model for Chat based on [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b) with **max_seq_lenght=2048** on various open source dataset. For the details of the used dataset, please refer to [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
- ## Model date
11
- Neural-chat-7b-v1.1 was trained between June and July 2023.
12
 
13
- ## Evaluation
14
- We use the same evaluation metrics as [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) which uses [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master), a unified framework to test generative language models on a large number of different evaluation tasks.
15
 
16
- | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ |
17
- | --- | --- | --- | --- | --- | --- |
18
- |[mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)| 47.4 | 47.61 | 77.56 | 31 | 33.43 |
19
- | [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) | **49.95** | 46.5 | 75.55 | 37.60 | 40.17 |
20
- | **Ours** | **51.41** | 50.09 | 76.69 | 38.79 | 40.07 |
21
 
22
- ### Bias evaluation
 
 
 
 
 
 
 
 
23
 
24
- Following the blog [evaluating-llm-bias](https://huggingface.co/blog/evaluating-llm-bias), we select 10000 samples randomly from [allenai/real-toxicity-prompts](https://huggingface.co/datasets/allenai/real-toxicity-prompts) to evaluate toxicity bias in Language Models
 
 
 
 
25
 
26
- | Model | Toxicity Rito ↓|
27
- | --- | --- |
28
- |[mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)| 0.027 |
29
- | **Ours** | 0.0264 |
30
-
31
-
32
- ## Training procedure
33
 
34
  ### Training hyperparameters
35
 
@@ -48,9 +88,10 @@ The following hyperparameters were used during training:
48
  - lr_scheduler_warmup_ratio: 0.02
49
  - num_epochs: 3.0
50
 
51
- ## Inference with transformers
52
 
53
- ```shell
 
54
  import transformers
55
  model = transformers.AutoModelForCausalLM.from_pretrained(
56
  'Intel/neural-chat-7b-v1-1',
@@ -58,10 +99,10 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
58
  )
59
  ```
60
 
61
- ## Inference with INT8
62
- Follow the instructions [link](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-generation/quantization) to install the necessary dependencies. Use the below command to quantize the model using Intel Neural Compressor [link](https://github.com/intel/neural-compressor) and accelerate the inference.
63
 
64
- ```shell
65
  python run_generation.py \
66
  --model Intel/neural-chat-7b-v1-1 \
67
  --quantize \
@@ -70,6 +111,55 @@ python run_generation.py \
70
  --ipex
71
  ```
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ### Examples
74
 
75
  - code generation
@@ -82,21 +172,20 @@ python run_generation.py \
82
  ![trip](examples/trip.png)
83
 
84
  ## Ethical Considerations and Limitations
85
- neural-chat-7b-v1-1 can produce factually incorrect output, and should not be relied on to produce factually accurate information. neural-chat-7b-v1-1 was trained on various instruction/chat datasets based on [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b). Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
86
 
87
- Therefore, before deploying any applications of neural-chat-7b-v1-1, developers should perform safety testing.
88
 
89
- ## Disclaimer
90
-
91
- The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
92
-
93
-
94
-
95
- ## Organizations developing the model
96
 
97
- The NeuralChat team with members from Intel/SATG/AIA/AIPT. Core team members: Kaokao Lv, Liang Lv, Chang Wang, Wenxin Zhang, Xuhui Ren, and Haihao Shen.
98
 
99
- ## Useful links
100
  * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
101
  * Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
102
  * Intel Extension for PyTorch [link](https://github.com/intel/intel-extension-for-pytorch)
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - LLMs
5
+ - Intel
6
+ base_model: mosaicml/mpt-7b
7
+ datasets:
8
+ - Intel/neural-chat-dataset-v1-1
9
+ - allenai/real-toxicity-prompts
10
+ language:
11
+ - en
12
+ model-index:
13
+ - name: neural-chat-7b-v1-1
14
+ results:
15
+ - task:
16
+ type: Large Language Model
17
+ name: Large Language Model
18
+ dataset:
19
+ type: Intel/neural-chat-dataset-v1-1
20
+ name: Intel/neural-chat-dataset-v1-1
21
+ metrics:
22
+ - type: Average
23
+ value: 51.41
24
+ name: Average
25
+ verified: true
26
+ - type: ARC (25-shot)
27
+ value: 50.09
28
+ name: ARC (25-shot)
29
+ verified: true
30
+ - type: HellaSwag (10-shot)
31
+ value: 76.69
32
+ name: HellaSwag (10-shot)
33
+ verified: true
34
+ - type: MMLU (5-shot)
35
+ value: 38.79
36
+ name: MMLU (5-shot)
37
+ verified: true
38
+ - type: TruthfulQA (0-shot)
39
+ value: 40.07
40
+ name: TruthfulQA (0-shot)
41
+ verified: true
42
+ - type: Toxicity Rito
43
+ value: 0.0264
44
+ name: Toxicity Rito
45
 
46
+ ---
47
+ ## Model Details: Neural-Chat-v1-1
48
 
49
+ This model is a fine-tuned model for chat based on [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b) with a max sequence length of 2048 on the dataset [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1), which is a compilation of open-source datasets.
 
50
 
51
+ <p align="center">
52
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6297f0e30bd2f58c647abb1d/fWCqhGKZQKNuLmvj093rB.jpeg" width="500"/>
53
+ Prompt of "an image of a brain that has to do with LLMs" from https://clipdrop.co/stable-diffusion-turbo.
54
+ </p>
 
55
 
56
+ | Model Detail | Description |
57
+ | ----------- | ----------- |
58
+ | Model Authors | Intel. The NeuralChat team with members from DCAI/AISE/AIPT. Core team members: Kaokao Lv, Liang Lv, Chang Wang, Wenxin Zhang, Xuhui Ren, and Haihao Shen. |
59
+ | Date | July, 2023 |
60
+ | Version | v1-1 |
61
+ | Type | 7B Large Language Model |
62
+ | Paper or Other Resources | Base model: [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b); Dataset: [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1) |
63
+ | License | Apache 2.0 |
64
+ | Questions or Comments | [Community Tab](https://huggingface.co/Intel/neural-chat-7b-v1-1/discussions) and [Intel DevHub Discord](https://discord.gg/rv2Gp55UJQ)|
65
 
66
+ | Intended Use | Description |
67
+ | ----------- | ----------- |
68
+ | Primary intended uses | You can use the fine-tuned model for several language-related tasks. Checkout the [LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) to see this model's performance relative to other LLMs. |
69
+ | Primary intended users | Anyone doing inference on language-related tasks. |
70
+ | Out-of-scope uses | This model in most cases will need to be fine-tuned for your particular task. The model should not be used to intentionally create hostile or alienating environments for people.|
71
 
72
+ ## How To Use
 
 
 
 
 
 
73
 
74
  ### Training hyperparameters
75
 
 
88
  - lr_scheduler_warmup_ratio: 0.02
89
  - num_epochs: 3.0
90
 
91
+ ## Use The Model
92
 
93
+ ### Loading the model with Transformers
94
+ ```python
95
  import transformers
96
  model = transformers.AutoModelForCausalLM.from_pretrained(
97
  'Intel/neural-chat-7b-v1-1',
 
99
  )
100
  ```
101
 
102
+ ### Inference with INT8
103
+ Follow the instructions at the [GitHub repository](https://github.com/intel/intel-extension-for-transformers/tree/main/examples/huggingface/pytorch/text-generation/quantization) to install the necessary dependencies for quantization to INT8. Use the below command to quantize the model using [Intel Neural Compressor](https://github.com/intel/neural-compressor) to accelerate inference.
104
 
105
+ ```bash
106
  python run_generation.py \
107
  --model Intel/neural-chat-7b-v1-1 \
108
  --quantize \
 
111
  --ipex
112
  ```
113
 
114
+ | Factors | Description |
115
+ | ----------- | ----------- |
116
+ | Groups | More details about the dataset can be found at [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1). |
117
+ | Instrumentation | The performance of the model can vary depending on the inputs to the model. In this case, the prompts provided can drastically change the prediction of the language model. |
118
+ | Environment | - |
119
+ | Card Prompts | Model deployment on varying hardware and software will change model performance. |
120
+
121
+ | Metrics | Description |
122
+ | ----------- | ----------- |
123
+ | Model performance measures | The model metrics are: ARC, HellaSwag, MMLU, and TruthfulQA. Bias evaluation was also evaluated using using Toxicity Rito (see Quantitative Analyses below). The model performance was evaluated against other LLMs according to the standards at the time the model was published. |
124
+ | Decision thresholds | No decision thresholds were used. |
125
+ | Approaches to uncertainty and variability | - |
126
+
127
+
128
+ ## Training Data
129
+
130
+ The training data are from [Intel/neural-chat-dataset-v1-1](https://huggingface.co/datasets/Intel/neural-chat-dataset-v1-1). The total number of instruction samples is about 1.1M, and the number of tokens is 326M. This dataset is composed of several other datasets:
131
+
132
+ | Type | Language | Dataset | Number |
133
+ |--| ---- |--------|----|
134
+ | HC3 | en | [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3) | 24K |
135
+ | dolly | en | [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | 15K |
136
+ | alpaca-zh | zh | [tigerbot-alpaca-zh-0.5m](https://huggingface.co/datasets/TigerResearch/tigerbot-alpaca-zh-0.5m) | 500K |
137
+ | alpaca-en | en | [TigerResearch/tigerbot-alpaca-en-50k](https://huggingface.co/datasets/TigerResearch/tigerbot-alpaca-en-50k) | 50K |
138
+ | math | en | [tigerbot-gsm-8k-en](https://huggingface.co/datasets/TigerResearch/tigerbot-gsm-8k-en) | 8K |
139
+ | general | en | [tigerbot-stackexchange-qa-en-0.5m](https://huggingface.co/datasets/TigerResearch/tigerbot-stackexchange-qa-en-0.5m) | 500K |
140
+
141
+ Note: There is no contamination from the GSM8k test set, as this is not a part of this dataset.
142
+
143
+ ## Quantitative Analyses
144
+
145
+ ### LLM metrics
146
+ We used the same evaluation metrics as [HuggingFaceH4/open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), which uses [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master), a unified framework to test generative language models on a large number of different evaluation tasks.
147
+
148
+ | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ |
149
+ | --- | --- | --- | --- | --- | --- |
150
+ |[mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)| 47.4 | 47.61 | 77.56 | 31 | 33.43 |
151
+ | [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) | **49.95** | 46.5 | 75.55 | 37.60 | 40.17 |
152
+ | [Intel/neural-chat-dataset-v1-1](https://huggingface.co/Intel/neural-chat-dataset-v1-1) | **51.41** | 50.09 | 76.69 | 38.79 | 40.07 |
153
+
154
+ ### Bias evaluation
155
+
156
+ Following the blog [evaluating-llm-bias](https://huggingface.co/blog/evaluating-llm-bias), we selected 10000 samples randomly from [allenai/real-toxicity-prompts](https://huggingface.co/datasets/allenai/real-toxicity-prompts) to evaluate toxicity bias.
157
+
158
+ | Model | Toxicity Rito ↓|
159
+ | --- | --- |
160
+ |[mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)| 0.027 |
161
+ | [Intel/neural-chat-dataset-v1-1](https://huggingface.co/Intel/neural-chat-dataset-v1-1) | 0.0264 |
162
+
163
  ### Examples
164
 
165
  - code generation
 
172
  ![trip](examples/trip.png)
173
 
174
  ## Ethical Considerations and Limitations
175
+ Neural-chat-7b-v1-1 can produce factually incorrect output, and should not be relied on to produce factually accurate information. neural-chat-7b-v1-1 was trained on various instruction/chat datasets based on [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b). Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
176
 
177
+ Therefore, before deploying any applications of the model, developers should perform safety testing.
178
 
179
+ ## Caveats and Recommendations
 
 
 
 
 
 
180
 
181
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
182
 
183
+ Here are some useful GitHub repository links to learn more about Intel's open-source AI software:
184
  * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
185
  * Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
186
  * Intel Extension for PyTorch [link](https://github.com/intel/intel-extension-for-pytorch)
187
+
188
+ ## Disclaimer
189
+
190
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
191
+