KaraKaraWitch nielsr HF Staff commited on
Commit
806eff7
·
verified ·
1 Parent(s): c94c56b

Add pipeline tag and link to paper and GitHub repository (#2)

Browse files

- Add pipeline tag and link to paper and GitHub repository (7ccb2236ed34dbedf185da44dcf9ee2f361a2dc9)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -1,7 +1,8 @@
1
  ---
2
- thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png
3
- license: apache-2.0
4
  library_name: transformers
 
 
 
5
  ---
6
 
7
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png)
@@ -20,10 +21,13 @@ Benchmarks is as follows for both Qwerky-QwQ-32B and Qwerky-72B models:
20
  | piqa | acc | **0.8036** | 0.7976 | 0.8248 | **0.8357** |
21
  | sciq | acc | **0.9630** | **0.9630** | 0.9670 | **0.9740** |
22
  | winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
23
- | mmlu | acc | 0.7431 | **0.7985** | 0.7746 | **0.8338** |
24
 
25
  > *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.*
26
 
 
 
 
27
 
28
  ## Running with `transformers`
29
 
@@ -82,4 +86,4 @@ As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B
82
 
83
  As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
84
 
85
- You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).
 
1
  ---
 
 
2
  library_name: transformers
3
+ license: apache-2.0
4
+ thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png
5
+ pipeline_tag: text-generation
6
  ---
7
 
8
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png)
 
21
  | piqa | acc | **0.8036** | 0.7976 | 0.8248 | **0.8357** |
22
  | sciq | acc | **0.9630** | **0.9630** | 0.9670 | **0.9740** |
23
  | winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
24
+ | mmlu | acc | 0.7431 | **0.7985** | 0.7746 | 0.8338 |
25
 
26
  > *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.*
27
 
28
+ This repository contains the model described in the paper [RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale](https://huggingface.co/papers/2505.03005).
29
+
30
+ Please check the Github repository at https://github.com/dmis-lab/Monet for more information.
31
 
32
  ## Running with `transformers`
33
 
 
86
 
87
  As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
88
 
89
+ You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).