OPEA
/

MiniMax-Text-01-int4-sym-inc-preview

@@ -3,22 +3,23 @@ datasets:
 - NeelNanda/pile-10k
 base_model:
 - MiniMaxAI/MiniMax-Text-01
 ---
 ## Model Details
 This model is an int4 model with group_size 128 and symmetric quantization of [MiniMaxAI/MiniMax-Text-01](https://huggingface.co/MiniMaxAI/MiniMax-Text-01) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm. This model is in AutoRound format, which is **NOT** supported by  other serving frameworks, such as vLLM.
 Please follow the [license](https://huggingface.co/MiniMaxAI/MiniMax-Text-01/blob/main/LICENSE) of the original model.
-## How To Use
-**INT4 Inference on CUDA**(**4*80G**)
 Requirements
 ```bash
 pip3 install git+https://github.com/intel/auto-round.git@bf16_inference
 pip3 install auto-gptq
 ```
@@ -29,17 +30,16 @@ from auto_round import AutoRoundConfig  ##must import for autoround format
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-quantized_model_dir = "OPEA/MiniMax-Text-01-int4-sym-inc-preview"
-tokenizer=AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained(quantized_model_dir,
-                                             trust_remote_code=True,
-                                             torch_dtype=torch.bfloat16,##must use bf16
                                              device_map="auto")
-##workaround for overflow
 def forward_hook(module, input, output):
-    return torch.clamp(output,-65504,65504).to(torch.bfloat16)
 def register_fp16_pre_hooks(model):
     for name, module in model.named_modules():
@@ -50,99 +50,100 @@ def register_fp16_pre_hooks(model):
 register_fp16_pre_hooks(model)
 tokenizer.pad_token = tokenizer.eos_token
-prompt="How many r in strawberry."
-messages = [
-    {"role": "system", "content": [{"type": "text",
-                                    "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
-    {"role": "user", "content": [{"type": "text", "text": prompt}]},
 ]
-text = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
-inputs = tokenizer(text, return_tensors="pt")
 outputs = model.generate(
     input_ids=inputs["input_ids"].to(model.device),
     attention_mask=inputs["attention_mask"].to(model.device),
     max_new_tokens=512,
     num_return_sequences=1,
-    do_sample=False, ##change this to align with offical usage
     eos_token_id=200020,
 )
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
 ]
-response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-print(response)
-"""
-Prompt: 为什么企鹅没有被北极熊吃掉？
-Generated: 在自然界中，**企鹅**和**北极熊**分别生活在地球的两端：**企鹅**主要生活在**南半球**的**南极洲**及其周围的海域，而** 北极熊**则生活在**北半球**的**北极地区**。这种地理上的分隔确保了**企鹅**和**北极熊**在自然界中**无法相遇**，因此**北极熊**无法**吃掉****企 鹅**。
-### 详细解释：
-1. **地理分布**：
-   - **企鹅**生活在**南半球**，特别是在**南极洲**及其周围的海域。**企鹅**的种类包括**帝企鹅**、**阿德利企鹅**等。
-   - **北极熊**生活在**北半球**，主要在**北极**地区，如**加拿大**、**阿拉斯加**、**格陵兰**和**俄罗斯**等。
-2. **生态习性**：
-   - **企鹅**
 --------------------------------------------------
 Prompt: 树枝上有十只鸟，如果你射杀了一只，还剩下几只？请用中文回答
 Generated: 让我一步步思考这个问题：
-1. 原本树枝上有10只鸟
-2. 射杀1只后：
-   * 射杀1只后, 鸟会受惊飞走
-   * 剩下的鸟会全部飞走
-3. 所以答案是:
-   * 0只鸟会留在树枝上
-   * 因为鸟会受惊飞走
-所以答案是0只。
-这个答案考虑到了自然界中动物对危险的本能反应, 当有同伴被射杀时, 其他鸟会立即飞走, 而不是继续停留在树枝上。
---------------------------------------------------
-Prompt: How many r in strawberry.
-Generated: Let me help you count the number of "r" in "strawberry" step by step.
-1. First, let's break down the word "strawberry" into its letters:
-   s - t - r - a - w - b - e - r - r - y
-2. Now, let's count the "r" letters:
-   - First "r" is at position 3
-   - Second "r" is at position 8
-   - Third "r" is at position 9
-3. So there are 3 "r" letters in "strawberry"
-Therefore, there are 3 "r" letters in the word "strawberry".
 --------------------------------------------------
 Prompt: How many r in strawberry.
-Generated: Let me help you solve this step by step.
-1) First, let's look at the word "strawberry" and count the letter "r" in it.
-2) The word "strawberry" has 11 letters.
-3) The letter "r" appears twice in "strawberry" (at the end of the word, and before the last "r")
-4) So, the answer is 2.
-The number of "r" in "strawberry" is 2.
-This is a good example of how the appearance of a word can be misleading - the word "strawberry" has more "r" than it appears to have at first glance.
 --------------------------------------------------
 Prompt: There is a girl who likes adventure,
-Generated: and she is not alone in her love for adventure.
---------------------------------------------------
 --------------------------------------------------
 Prompt: hello
 Generated: Hello! How can I assist you today?
 """
 ~~~
@@ -150,17 +151,19 @@ Generated: Hello! How can I assist you today?
 pip3 install git+https://github.com/intel/auto-round.git@bf16_inference
-```pytho
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
 model_name = "MiniMaxAI/MiniMax-Text-01"
-config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16")
 fp_layers = [f"model.layers.{i}.block_sparse_moe.gate" for i in range(config.num_hidden_layers)]
-fp_layers.append("model.layers.4.block_sparse_moe.experts.10.w2")
-fp_layers.append("model.layers.5.block_sparse_moe.experts.18.w2")
 device_map = {}
 for i in range(32):
     key = fr"model\.layers\.\d+\.block_sparse_moe\.experts\.{str(i)}\..*$"
@@ -168,18 +171,15 @@ for i in range(32):
         device_map[key] = 0
     else:
         device_map[key] = 1
-layer_config = {}
-for fp_layer in fp_layers:
-    layer_config[fp_layer] = {"bits": 16}
 from auto_round import AutoRound
 autoround = AutoRound(model=model, tokenizer=tokenizer, layer_config=layer_config, device_map=device_map,
-                      low_gpu_mem_usage=False, batch_size=1,
-                      gradient_accumulate_steps=4, seqlen=512,iters=50,lr=5e-3)
 autoround.quantize()
 autoround.save_quantized(format="auto_round", output_dir="tmp_autoround")
-exit()
 ```

 - NeelNanda/pile-10k
 base_model:
 - MiniMaxAI/MiniMax-Text-01
 ---
 ## Model Details
 This model is an int4 model with group_size 128 and symmetric quantization of [MiniMaxAI/MiniMax-Text-01](https://huggingface.co/MiniMaxAI/MiniMax-Text-01) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm. This model is in AutoRound format, which is **NOT** supported by  other serving frameworks, such as vLLM.
 Please follow the [license](https://huggingface.co/MiniMaxAI/MiniMax-Text-01/blob/main/LICENSE) of the original model.
+## INT4 Inference on CUDA**(**4*80G**)
 Requirements
 ```bash
 pip3 install git+https://github.com/intel/auto-round.git@bf16_inference
 pip3 install auto-gptq
 ```
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+quantized_model_dir = "/data3/wenhuach/MiniMax-Text-01-int4-sym-w4g128"
+tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(quantized_model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16,
                                              device_map="auto")
 def forward_hook(module, input, output):
+    return torch.clamp(output, -65504, 65504).to(torch.bfloat16)
 def register_fp16_pre_hooks(model):
     for name, module in model.named_modules():
 register_fp16_pre_hooks(model)
 tokenizer.pad_token = tokenizer.eos_token
+prompts = [
+    "为什么企鹅没有被北极熊吃掉？",
+    "树枝上有十只鸟，如果你射杀了一只，还剩下几只？请用中文回答",
+    "How many r in strawberry.",
+    "There is a girl who likes adventure,",
+    "hello"
 ]
+texts = []
+for prompt in prompts:
+    messages = [
+        {"role": "system", "content": [{"type": "text",
+                                        "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
+        {"role": "user", "content": [{"type": "text", "text": prompt}]},
+    ]
+    text = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+    )
+    texts.append(text)
+inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, padding_side='left')
 outputs = model.generate(
     input_ids=inputs["input_ids"].to(model.device),
     attention_mask=inputs["attention_mask"].to(model.device),
     max_new_tokens=512,
     num_return_sequences=1,
+    do_sample=False,
     eos_token_id=200020,
 )
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
 ]
+decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
+for i, prompt in enumerate(prompts):
+    input_id = inputs
+    print(f"Prompt: {prompt}")
+    print(f"Generated: {decoded_outputs[i]}")
+    print("-" * 50)
+"""
+Prompt: 为什么企鹅没有被北极熊吃掉？
+Generated: ### 1. **地理分布差异**
+   - **企鹅**：主要生活在**南半球**，例如**南极洲**。在南极洲，企鹅没有天敌，因为这里的环境非常恶劣，食物资源有限，动物数量也有限，企鹅是这里的顶级掠食者之一。
+   - **北极熊**：主要生活在**北半球**，例如**北极地区**。北极熊是北极地区的顶级掠食者之一，它们以海豹等动物为食。
+   - **结论**：由于**地理分布**的差异，**企鹅和北极熊**在自然界中**无法相遇**，因此**北极熊无法吃掉企鹅**。
+### 2. **人为因素**
+   - **动物园或水族馆**：在**人为因素**的影响
 --------------------------------------------------
 Prompt: 树枝上有十只鸟，如果你射杀了一只，还剩下几只？请用中文回答
 Generated: 让我一步步思考这个问题：
+1. 首先,树枝上有10只鸟
+2. 射杀1只后,还剩9只
+3. 但实际上,当枪声响起,其他鸟会因惊吓而飞走
+4. 所以,当射杀1只后,树上不会剩下任何鸟
+因此,答案是:0只
+因为鸟会因枪声而飞走,不会继续停留在树上。
 --------------------------------------------------
 Prompt: How many r in strawberry.
+Generated: Let me solve this step by step.
+1. First, let me count the r's in "strawberry" as I say it
+   * s (not r)
+   * t (not r)
+   * r (1st r)
+   * a (not r)
+   * w (not r)
+   * b (not r)
+   * b (not r)
+   * e (not r)
+   * r (2nd r)
+   * r (3rd r)
+   * y (not r)
+2. Counting the r's: 3 r's
+Therefore, there is 3 r in strawberry.
+The answer is 3.
 --------------------------------------------------
 Prompt: There is a girl who likes adventure,
+Generated: There is a girl who likes adventure, and her name is Emily. Emily has always been drawn to the thrill of the unknown, the excitement of stepping into uncharted territory. Here is a story about
 --------------------------------------------------
 Prompt: hello
 Generated: Hello! How can I assist you today?
+--------------------------------------------------
 """
 ~~~
 pip3 install git+https://github.com/intel/auto-round.git@bf16_inference
+```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
 model_name = "MiniMaxAI/MiniMax-Text-01"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16)
 fp_layers = [f"model.layers.{i}.block_sparse_moe.gate" for i in range(config.num_hidden_layers)]
+layer_config = {}
+for fp_layer in fp_layers:
+    layer_config[fp_layer] = {"bits": 16}
 device_map = {}
 for i in range(32):
     key = fr"model\.layers\.\d+\.block_sparse_moe\.experts\.{str(i)}\..*$"
         device_map[key] = 0
     else:
         device_map[key] = 1
 from auto_round import AutoRound
 autoround = AutoRound(model=model, tokenizer=tokenizer, layer_config=layer_config, device_map=device_map,
+                       batch_size=1,gradient_accumulate_steps=4, seqlen=512)
 autoround.quantize()
 autoround.save_quantized(format="auto_round", output_dir="tmp_autoround")
 ```