Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,3 +1,29 @@ | |
| 1 | 
            -
            ---
         | 
| 2 | 
            -
            license: apache-2.0
         | 
| 3 | 
            -
            ---
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            license: apache-2.0
         | 
| 3 | 
            +
            ---
         | 
| 4 | 
            +
            # Model Card for TurboSparse-Mistral
         | 
| 5 | 
            +
            The TurboSparse-Mixtral Large Language Model (LLM) is an sparsified version of the Mixtral.
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            <img src="takeaway.png" alt="avatar" width="300" height="200"/>
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            The average performance is evaluated using benchmarks from the OpenLLM Leaderboard.
         | 
| 10 | 
            +
             | 
| 11 | 
            +
            ## Inference
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            Our code for accelerating TurboSparse-Mixtral is currently being refined. Stay tuned! Now you can run this model like dense model.
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            ## Chat-Template
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            During sparsification, we also utilize some SFT datasets.
         | 
| 18 | 
            +
            We take ChatML as our chat template:
         | 
| 19 | 
            +
            ```
         | 
| 20 | 
            +
            <|im_start|>user\n{{content}}<|im_end|>\n<|im_start|>assistant\n
         | 
| 21 | 
            +
            ```
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            ## Allow Finetuning
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            As we merged the predictors for FFN neurons in models, you can finetune TurboSparse-Mistral with any framework and algorithm.
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            ## License
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            The model is licensed under Apache-2.0, while model weights are fully open for academic research and also allow **free** commercial usage. 
         | 

