Add pipeline tag and link to paper and GitHub repository (#2)
Browse files- Add pipeline tag and link to paper and GitHub repository (7ccb2236ed34dbedf185da44dcf9ee2f361a2dc9)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,7 +1,8 @@
|
|
1 |
---
|
2 |
-
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png
|
3 |
-
license: apache-2.0
|
4 |
library_name: transformers
|
|
|
|
|
|
|
5 |
---
|
6 |
|
7 |

|
@@ -20,10 +21,13 @@ Benchmarks is as follows for both Qwerky-QwQ-32B and Qwerky-72B models:
|
|
20 |
| piqa | acc | **0.8036** | 0.7976 | 0.8248 | **0.8357** |
|
21 |
| sciq | acc | **0.9630** | **0.9630** | 0.9670 | **0.9740** |
|
22 |
| winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
|
23 |
-
| mmlu | acc | 0.7431 | **0.7985** | 0.7746 |
|
24 |
|
25 |
> *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.*
|
26 |
|
|
|
|
|
|
|
27 |
|
28 |
## Running with `transformers`
|
29 |
|
@@ -82,4 +86,4 @@ As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B
|
|
82 |
|
83 |
As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
|
84 |
|
85 |
-
You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).
|
|
|
1 |
---
|
|
|
|
|
2 |
library_name: transformers
|
3 |
+
license: apache-2.0
|
4 |
+
thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png
|
5 |
+
pipeline_tag: text-generation
|
6 |
---
|
7 |
|
8 |

|
|
|
21 |
| piqa | acc | **0.8036** | 0.7976 | 0.8248 | **0.8357** |
|
22 |
| sciq | acc | **0.9630** | **0.9630** | 0.9670 | **0.9740** |
|
23 |
| winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
|
24 |
+
| mmlu | acc | 0.7431 | **0.7985** | 0.7746 | 0.8338 |
|
25 |
|
26 |
> *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.*
|
27 |
|
28 |
+
This repository contains the model described in the paper [RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale](https://huggingface.co/papers/2505.03005).
|
29 |
+
|
30 |
+
Please check the Github repository at https://github.com/dmis-lab/Monet for more information.
|
31 |
|
32 |
## Running with `transformers`
|
33 |
|
|
|
86 |
|
87 |
As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
|
88 |
|
89 |
+
You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).
|