sedrickkeh
commited on
Commit
•
5bde2e1
1
Parent(s):
17b544f
Update README.md
Browse files
README.md
CHANGED
@@ -79,6 +79,8 @@ This is a 7B parameter model with the [Mamba](https://arxiv.org/abs/2312.00752)
|
|
79 |
Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
|
80 |
We follow their training recipe and release our version of Mamba-7B.
|
81 |
|
|
|
|
|
82 |
## Model Details
|
83 |
- **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
|
84 |
- **Model Type**: This is an auto-regressive language model based on the [Mamba](https://arxiv.org/abs/2312.00752) architecture.
|
|
|
79 |
Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
|
80 |
We follow their training recipe and release our version of Mamba-7B.
|
81 |
|
82 |
+
This model was trained as a baseline for our paper [Linearizing Large Language Models](https://arxiv.org/abs/2405.06640).
|
83 |
+
|
84 |
## Model Details
|
85 |
- **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
|
86 |
- **Model Type**: This is an auto-regressive language model based on the [Mamba](https://arxiv.org/abs/2312.00752) architecture.
|