Text Generation
Transformers

Improve model card: remove incorrect project page, add usage and citation

#5
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +53 -2
README.md CHANGED
@@ -2,7 +2,6 @@
2
  library_name: transformers
3
  license: apache-2.0
4
  pipeline_tag: text-generation
5
- project_page: https://sites.google.com/view/eagle-llm
6
  repo_url: https://github.com/recursal/RADLADS-paper
7
  ---
8
 
@@ -33,4 +32,56 @@ The file numbering is currently off by one from the step numbers shown in the pa
33
  |L28-D3584-qwerky7_qwen2-3-4k-ckpt5.pth|2|Qwen2.5-7B-Instruct|RAD-RWKV7|4k ctxlen training, early checkpoint|
34
  |L28-D3584-qwerky7_qwen2-3-4k.pth|2|Qwen2.5-7B-Instruct|RAD-RWKV7|4k ctxlen training|
35
 
36
- More information can be found at the Github repository: [https://github.com/recursal/RADLADS-paper](https://github.com/recursal/RADLADS-paper)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
  license: apache-2.0
4
  pipeline_tag: text-generation
 
5
  repo_url: https://github.com/recursal/RADLADS-paper
6
  ---
7
 
 
32
  |L28-D3584-qwerky7_qwen2-3-4k-ckpt5.pth|2|Qwen2.5-7B-Instruct|RAD-RWKV7|4k ctxlen training, early checkpoint|
33
  |L28-D3584-qwerky7_qwen2-3-4k.pth|2|Qwen2.5-7B-Instruct|RAD-RWKV7|4k ctxlen training|
34
 
35
+ ## Usage
36
+
37
+ This repository contains various PyTorch `.pth` checkpoints from the RADLADS paper, which are primarily intended for research, ablation studies, and conversion. To use these models with the Hugging Face `transformers` library, you will generally need to convert them to the Hugging Face format first.
38
+
39
+ Please refer to the original GitHub repository for detailed instructions on how to convert these checkpoints to Hugging Face-compatible formats and for specific usage examples: [https://github.com/recursal/RADLADS-paper](https://github.com/recursal/RADLADS-paper)
40
+
41
+ For models already converted to Hugging Face format and ready for direct use, please refer to the main [Recursal RADLADS collection](https://huggingface.co/collections/recursal/radlads-6818ee69e99e729ba8a87102) on the Hugging Face Hub.
42
+
43
+ A conceptual example for loading a text generation model with `transformers` (after it has been converted to Hugging Face format, or if you are using a model from the main collection):
44
+
45
+ ```python
46
+ from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
47
+ import torch
48
+
49
+ # Replace "recursal/RADLADS-RWKV7-Qwen2.5-7B" with the actual ID of a converted model
50
+ # from the Recursal RADLADS collection, or your local path to a converted model.
51
+ model_name = "recursal/RADLADS-RWKV7-Qwen2.5-7B"
52
+
53
+ try:
54
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ model_name,
57
+ torch_dtype=torch.bfloat16, # Adjust dtype based on model
58
+ device_map="auto",
59
+ trust_remote_code=True,
60
+ )
61
+ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
62
+
63
+ prompt = "The key to life is"
64
+ print(pipe(prompt, max_new_tokens=20, do_sample=True)[0]["generated_text"])
65
+
66
+ except Exception as e:
67
+ print(f"Could not load model directly with pipeline. This repository contains raw checkpoints that require conversion.")
68
+ print(f"Please refer to the original GitHub repository for detailed conversion and usage instructions: https://github.com/recursal/RADLADS-paper")
69
+ print(f"Or explore pre-converted models in the Recursal collection: https://huggingface.co/collections/recursal/radlads-6818ee69e99e729ba8a87102")
70
+
71
+ ```
72
+
73
+ ## Citation
74
+
75
+ If you use this code or find our work valuable, please consider citing RADLADS:
76
+
77
+ ```bibtex
78
+ @misc{goldstein2025radladsrapidattentiondistillation,
79
+ title={RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale},
80
+ author={Daniel Goldstein and Eric Alcaide and Janna Lu and Eugene Cheah},
81
+ year={2025},
82
+ eprint={2505.03005},
83
+ archivePrefix={arXiv},
84
+ primaryClass={cs.CL},
85
+ url={https://arxiv.org/abs/2505.03005},
86
+ }
87
+ ```