|
--- |
|
datasets: |
|
- wikimedia/wikipedia |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Info |
|
|
|
xLSTM Trained on a shuffeld wikimedia/wikipedia 20231101.en dataset (seed=42) |
|
|
|
No eval |
|
model checkpoints as branches |
|
|
|
``` |
|
num_blocks=24, |
|
num_heads=4, |
|
embedding_dim=768, |
|
|
|
per_device_train_batch_size=32, |
|
|
|
logging_steps=3650, |
|
gradient_accumulation_steps=8, |
|
num_train_epochs=1, |
|
weight_decay=0.1, |
|
warmup_steps=1_000, |
|
lr_scheduler_type="cosine", |
|
learning_rate=5e-4, |
|
save_steps=3650, |
|
fp16=True, |
|
``` |
|
|
|
## How to use |
|
Install: |
|
``` |
|
pip install xlstm |
|
pip install mlstm_kernels |
|
pip install 'transformers @ git+https://[email protected]/NX-AI/transformers.git@integrate_xlstm_clean' |
|
``` |
|
|
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
xlstm = AutoModelForCausalLM.from_pretrained("J4bb4wukis/xlstm_wikipedia_en_shuffeld") |
|
tokenizer = AutoTokenizer.from_pretrained("J4bb4wukis/xlstm_wikipedia_en_shuffeld") |
|
|
|
prompts = "Angela Merkel is" |
|
inputs = tokenizer(prompts,return_tensors='pt').input_ids |
|
outputs = xlstm.generate(inputs, max_new_tokens=100, do_sample=True, top_k=10, top_p=0.95) |
|
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)) |
|
``` |