Add pipeline tag and link to paper and GitHub repository (#2)

- Add pipeline tag and link to paper and GitHub repository (7ccb2236ed34dbedf185da44dcf9ee2f361a2dc9)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,7 +1,8 @@
 ---
-thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png
-license: apache-2.0
 library_name: transformers
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png)
@@ -20,10 +21,13 @@ Benchmarks is as follows for both Qwerky-QwQ-32B and Qwerky-72B models:
 | piqa | acc | **0.8036** | 0.7976 | 0.8248 | **0.8357** |
 | sciq | acc | **0.9630** | **0.9630** | 0.9670 | **0.9740** |
 | winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
-| mmlu | acc | 0.7431 | **0.7985** | 0.7746 | **0.8338** |
 > *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.*
 ## Running with `transformers`
@@ -82,4 +86,4 @@ As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B
 As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
-You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).

 ---
 library_name: transformers
+license: apache-2.0
+thumbnail: https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png
+pipeline_tag: text-generation
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/OufWyNMKYRozfC8j8S-M8.png)
 | piqa | acc | **0.8036** | 0.7976 | 0.8248 | **0.8357** |
 | sciq | acc | **0.9630** | **0.9630** | 0.9670 | **0.9740** |
 | winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
+| mmlu | acc | 0.7431 | **0.7985** | 0.7746 | 0.8338 |
 > *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.*
+This repository contains the model described in the paper [RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale](https://huggingface.co/papers/2505.03005).
+Please check the Github repository at https://github.com/dmis-lab/Monet for more information.
 ## Running with `transformers`
 As with our previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
+You may find our details of the process from our previous release, [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).