Safetensors
qwen2
File size: 3,380 Bytes
9f7fc7f
 
 
9260b43
 
 
 
 
 
5bf22c9
9260b43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80d629c
 
9260b43
 
 
 
 
 
 
5efe432
 
 
 
 
 
 
 
 
9260b43
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---

license: apache-2.0
---

# Qwen2-7B-ReLU

Qwen2-7B-ReLU is a variant of Qwen2-7B that replaces the SiLU/Swish activation function with dReLU, achieving higher sparsity while maintaining the performance of the original model.

## Key Features

- Replaces SiLU/Swish activation function with dReLU
- Maintains comparable or even better performance with the original Qwen2-7B
- Significantly increases activation sparsity, enabling further optimization and compression

## Technical Details

The key modification in this version is the application of ReLU activation to both branches in the MLP block. The implementation modifies the original `Qwen2MLP` class as follows:

```python

class Qwen2MLP(nn.Module):

    def __init__(self, config):

        super().__init__()

        self.config = config

        self.hidden_size = config.hidden_size

        self.intermediate_size = config.intermediate_size

        self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)

        self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)

        self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)

        self.act_fn = ACT2FN[config.hidden_act]



    def forward(self, x):

        down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.act_fn(self.up_proj(x)))

        return down_proj

```
The key change is in the forward pass, where the activation function is now applied to both the gate projection and up projection outputs before multiplication. This modification, combined with the use of ReLU, contributes to the increased sparsity of the model.

## Intended Usage

This release primarily targets the research community for:
- Studying sparsity in large language models
- Model compression and optimization research
- Understanding the impact of activation functions on model behavior

## Model Limitations

- The model may exhibit biases present in the training data
- May generate incorrect, inappropriate, or harmful content
- Performance may vary across different domains and tasks
- Not suitable for production deployment without proper evaluation

## Quick Start

You should replace original modeling_qwen FFN implementation code to dReLU firstly.



```python

from transformers import AutoModelForCausalLM, AutoTokenizer



model = AutoModelForCausalLM.from_pretrained("PowerInfer/SparseQwen2-7B")
tokenizer = AutoTokenizer.from_pretrained("PowerInfer/SparseQwen2-7B")



prompt = "Hello"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)

response = tokenizer.decode(outputs[0])

```



## Benchmarks



The model has been evaluated on standard benchmarks to verify its performance:



- **MMLU**: 69.19% (5-shot)

- **IFEval**: 73.2% (Prompt Strict-Accuracy)



These results demonstrate that the ReLU modification maintains competitive performance while achieving higher sparsity compared to the original model.



## Citation



If you use this model in your research, please cite:



```bibtex

@article{song2024turbo,

  title={Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters},

  author={Song, Yixin and Xie, Haotong and Zhang, Zhengyan and Wen, Bo and Ma, Li and Mi, Zeyu and Chen, Haibo},

  journal={arXiv preprint arXiv:2406.05955},

  year={2024}

}

```