Safetensors
qwen2
yixinsong commited on
Commit
b83c849
·
1 Parent(s): 4f2ba0d
Files changed (1) hide show
  1. README.md +16 -15
README.md CHANGED
@@ -11,6 +11,22 @@ Qwen2-7B-ReLU is a variant of Qwen2-7B that replaces the SiLU/Swish activation f
11
  - Maintains comparable or even better performance with the original Qwen2-7B
12
  - Significantly increases activation sparsity, enabling further optimization and compression
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ## Technical Details
15
 
16
  The key modification in this version is the application of ReLU activation to both branches in the MLP block. The implementation modifies the original `Qwen2MLP` class as follows:
@@ -63,22 +79,7 @@ outputs = model.generate(**inputs)
63
  response = tokenizer.decode(outputs[0])
64
  ```
65
 
66
- ## Benchmarks
67
-
68
- The model has been evaluated on standard benchmarks to verify its performance:
69
 
70
- - **MMLU**: 69.19% (5-shot)
71
- - **IFEval**: 73.2% (Prompt Strict-Accuracy)
72
- - **Livebench**:
73
- - Average: 32.1%
74
- - Coding: 39.8%
75
- - Data Analysis: 45.3%
76
- - Instruction Following: 58.1%
77
- - Language: 9.0%
78
- - Math: 22.0%
79
- - Reasoning: 18.7%
80
-
81
- These results demonstrate that the ReLU modification maintains competitive performance while achieving higher sparsity compared to the original model.
82
 
83
  ## Citation
84
 
 
11
  - Maintains comparable or even better performance with the original Qwen2-7B
12
  - Significantly increases activation sparsity, enabling further optimization and compression
13
 
14
+ ## Benchmarks
15
+
16
+ The model has been evaluated on standard benchmarks to verify its performance:
17
+
18
+ - **MMLU**: 69.19% (5-shot)
19
+ - **IFEval**: 73.2% (Prompt Strict-Accuracy)
20
+ - **Livebench**:
21
+ - Average: 32.1%
22
+ - Coding: 39.8%
23
+ - Data Analysis: 45.3%
24
+ - Instruction Following: 58.1%
25
+ - Language: 9.0%
26
+ - Math: 22.0%
27
+ - Reasoning: 18.7%
28
+
29
+ These results demonstrate that the ReLU modification maintains competitive performance while achieving higher sparsity compared to the original model.
30
  ## Technical Details
31
 
32
  The key modification in this version is the application of ReLU activation to both branches in the MLP block. The implementation modifies the original `Qwen2MLP` class as follows:
 
79
  response = tokenizer.decode(outputs[0])
80
  ```
81
 
 
 
 
82
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
  ## Citation
85