File size: 999 Bytes
a566b70
8391d90
 
 
 
a566b70
 
bb3f15c
a566b70
8391d90
a566b70
8391d90
a566b70
8391d90
 
 
 
 
 
 
 
 
 
 
 
a566b70
8391d90
 
 
 
 
 
a566b70
 
8391d90
 
a566b70
66d4d59
 
a566b70
8391d90
 
b0dc7bd
8391d90
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
datasets:
- wikimedia/wikipedia
language:
- en
---

# Model Info

mamba2 Trained on a shuffeld wikimedia/wikipedia 20231101.en dataset (seed=42)

Model checkpoints as branches

```
per_device_train_batch_size=32,
logging_steps=3650,
gradient_accumulation_steps=8,
num_train_epochs=1,
weight_decay=0.1,
warmup_steps=1_000,
lr_scheduler_type="cosine",
learning_rate=5e-4,
save_steps=3650,
fp16=True,
```

## How to use
Install:
```
pip install causal-conv1d>=1.2.0
pip install mamba-ssm
```


```python
from transformers import AutoModelForCausalLM, AutoTokenizer

mamba2 = AutoModelForCausalLM.from_pretrained("J4bb4wukis/mamba2_127m_wikipedia_en_shuffeld")
tokenizer = AutoTokenizer.from_pretrained("J4bb4wukis/mamba2_127m_wikipedia_en_shuffeld")

prompts = "Angela Merkel is"
inputs = tokenizer(prompts,return_tensors='pt').input_ids
outputs = mamba2.generate(inputs, max_new_tokens=100, do_sample=True, top_k=10, top_p=0.95)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
```