DeciMamba Checkpoint (Baseline)
The official checkpoint of Mamba-130m, finetuned for Language Modeling over the PG-19 dataset as presented in DeciMamba: Exploring the Length Extrapolation Potential of Mamba.
See our Github Repo for evalution and training scripts.
Bibtex:
@misc{benkish2024decimambaexploringlengthextrapolation,
title={DeciMamba: Exploring the Length Extrapolation Potential of Mamba},
author={Assaf Ben-Kish and Itamar Zimerman and Shady Abu-Hussein and Nadav Cohen and Amir Globerson and Lior Wolf and Raja Giryes},
year={2024},
eprint={2406.14528},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2406.14528},
}
- Downloads last month
- 11