nielsr HF Staff commited on
Commit
4789d23
·
verified ·
1 Parent(s): fd234ef

Add comprehensive model card for RetNet model

Browse files

This PR adds a comprehensive model card for this RetNet model, which is part of the research presented in the paper "[A Systematic Analysis of Hybrid Linear Attention](https://huggingface.co/papers/2507.06457)".

The model card includes:
- Relevant metadata (`license`, `library_name`, `pipeline_tag`) to enhance discoverability on the Hugging Face Hub.
- Links to the associated paper, project page (collection), and GitHub repository for the Flash Linear Attention project.
- A clear description of the model and a usage example using the `transformers` library for text generation.
- The academic citation for the related works.

Files changed (1) hide show
  1. README.md +112 -0
README.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ ---
6
+
7
+ # A Systematic Analysis of Hybrid Linear Attention: RetNet Model
8
+
9
+ This repository contains a RetNet model checkpoint, which is part of the comprehensive research presented in the paper [**A Systematic Analysis of Hybrid Linear Attention**](https://huggingface.co/papers/2507.06457). This work systematically evaluates various linear attention models, both standalone and in hybrid architectures, to address the quadratic complexity and memory issues of traditional Transformers with long sequences.
10
+
11
+ The paper highlights the development and open-sourcing of 72 models (36 at 340M parameters and 36 at 1.3B parameters), covering six linear attention variants across five hybridization ratios. This allows for a comprehensive analysis on standard language modeling and recall tasks, revealing insights into effective architectural choices for achieving Transformer-level recall efficiently.
12
+
13
+ ## About this Model
14
+
15
+ This specific model is a **RetNet** variant, an architecture investigated in the "A Systematic Analysis of Hybrid Linear Attention" research. It is a 1.3B parameter model trained on 100B tokens, designed for efficient sequence modeling and text generation.
16
+
17
+ ## Usage
18
+
19
+ This model is compatible with the Hugging Face `transformers` library. You can load and use it for text generation as follows:
20
+
21
+ ```python
22
+ import torch
23
+ from transformers import AutoModelForCausalLM, AutoTokenizer
24
+
25
+ # Replace with the specific model ID of this repository, e.g., 'fla-hub/retnet-1.3B-100B'
26
+ model_name = "fla-hub/retnet-1.3B-100B"
27
+
28
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
29
+ model = AutoModelForCausalLM.from_pretrained(
30
+ model_name,
31
+ torch_dtype=torch.bfloat16,
32
+ device_map="auto",
33
+ trust_remote_code=True # Required for custom architectures like RetNet
34
+ ).eval()
35
+
36
+ input_prompt = "Power goes with permanence. Impermanence is impotence. And rotation is castration."
37
+ input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.cuda()
38
+
39
+ # Generate text
40
+ outputs = model.generate(input_ids, max_new_tokens=64)
41
+ generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
42
+
43
+ print(f"Prompt:
44
+ {input_prompt}
45
+ ")
46
+ print(f"Generated:
47
+ {generated_text}")
48
+ ```
49
+
50
+ ## Paper
51
+
52
+ The model was presented in the paper:
53
+ [**A Systematic Analysis of Hybrid Linear Attention**](https://huggingface.co/papers/2507.06457)
54
+
55
+ ## Project Page
56
+
57
+ Explore more models and research related to Flash Linear Attention in the Hugging Face collection:
58
+ [**Hybrid Linear Attention Research Collection**](https://huggingface.co/collections/m-a-p/hybrid-linear-attention-research-686c488a63d609d2f20e2b1e)
59
+
60
+ ## Code
61
+
62
+ The official implementation and more details regarding the Flash Linear Attention (FLA) project can be found on its GitHub repository:
63
+ [**fla-org/flash-linear-attention**](https://github.com/fla-org/flash-linear-attention)
64
+
65
+ ## Citation
66
+
67
+ If you find this work useful, please consider citing the original paper and the FLA library:
68
+
69
+ ```bib
70
+ @software{yang2024fla,
71
+ title = {FLA: A Triton-Based Library for Hardware-Efficient Implementations of Linear Attention Mechanism},
72
+ author = {Yang, Songlin and Zhang, Yu},
73
+ url = {https://github.com/fla-org/flash-linear-attention},
74
+ month = jan,
75
+ year = {2024}
76
+ }
77
+
78
+ @inproceedings{yang2024gdn,
79
+ title = {Gated Delta Networks: Improving Mamba2 with Delta Rule},
80
+ author = {Songlin Yang and Jan Kautz and Ali Hatamizadeh},
81
+ booktitle = {Proceedings of ICLR},
82
+ year = {2025}
83
+ }
84
+
85
+ @inproceedings{yang2024deltanet,
86
+ title = {Parallelizing Linear Transformers with the Delta Rule over Sequence Length},
87
+ author = {Yang, Songlin and Wang, Bailin and Zhang, Yu and Shen, Yikang and Kim, Yoon},
88
+ booktitle = {Proceedings of NeurIPS},
89
+ year = {2024}
90
+ }
91
+
92
+ @inproceedings{zhang2024gsa,
93
+ title = {Gated Slot Attention for Efficient Linear-Time Sequence Modeling},
94
+ author = {Zhang, Yu and Yang, Songlin and Zhu, Ruijie and Zhang, Yue and Cui, Leyang and Wang, Yiqiao and Wang, Bolun and Shi, Freda and Wang, Bailin and Bi, Wei and Zhou, Peng and Fu, Guohong},
95
+ booktitle = {Proceedings of NeurIPS},
96
+ year = {2024}
97
+ }
98
+
99
+ @inproceedings{qin2024hgrn2,
100
+ title = {HGRN2: Gated Linear RNNs with State Expansion},
101
+ author = {Qin, Zhen and Yang, Songlin and Sun, Weixuan and Shen, Xuyang and Li, Dong and Sun, Weigao and Zhong, Yiran},
102
+ booktitle = {Proceedings of COLM},
103
+ year = {2024}
104
+ }
105
+
106
+ @inproceedings{yang2024gla,
107
+ title = {Gated Linear Attention Transformers with Hardware-Efficient Training},
108
+ author = {Yang, Songlin and Wang, Bailin and Shen, Yikang and Panda, Rameswar and Kim, Yoon},
109
+ booktitle = {Proceedings of ICML},
110
+ year = {2024}
111
+ }
112
+ ```