Mambarim-110M
Model Summary
Mambarim-110M is the first Portuguese language model based on a state-space model architecture (Mamba), not a transformer.
WIP
Details
- Architecture: a Mamba model pre-trained via causal language modeling
- Size: 119,930,880 parameters
- Context length: 2048 tokens
- Dataset: Pt-Corpus Instruct (6.2B tokens)
- Language: Portuguese
- Number of steps: 758,423
This repository has the source code used to train this model.
Intended Uses
WIP
Out-of-scope Use
WIP
Basic usage
You need to install transformers
from main
until transformers=4.39.0
is released.
pip install git+https://github.com/huggingface/transformers@main
We also recommend you to install both causal_conv_1d
and mamba-ssm
using:
pip install causal-conv1d>=1.2.0
pip install mamba-ssm
You can use the classic generate
API:
>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
input_ids,
repetition_penalty=1.2,
temperature=0.8,
top_k=50,
top_p=0.85,
do_sample=True,
max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]
Benchmarks
Evaluations on Brazilian Portuguese benchmarks were performed using a Portuguese implementation of the EleutherAI LM Evaluation Harness (created by Eduardo Garcia).
Detailed results can be found here
Model | Average | ENEM | BLUEX | OAB Exams | ASSIN2 RTE | ASSIN2 STS | FAQNAD NLI | HateBR | PT Hate Speech | tweetSentBR | Architecture |
---|---|---|---|---|---|---|---|---|---|---|---|
TeenyTinyLlama-460m | 28.86 | 20.15 | 25.73 | 27.02 | 53.61 | 13 | 46.41 | 33.59 | 22.99 | 17.28 | LlamaForCausalLM |
TeenyTinyLlama-160m | 28.2 | 19.24 | 23.09 | 22.37 | 53.97 | 0.24 | 43.97 | 36.92 | 42.63 | 11.39 | LlamaForCausalLM |
MulaBR/Mula-4x160-v0.1 | 26.24 | 21.34 | 25.17 | 25.06 | 33.57 | 11.35 | 43.97 | 41.5 | 22.99 | 11.24 | MixtralForCausalLM |
TeenyTinyLlama-460m-Chat | 25.49 | 20.29 | 25.45 | 26.74 | 43.77 | 4.52 | 34 | 33.49 | 22.99 | 18.13 | LlamaForCausalLM |
manbarim-110m | 14.16 | 18.4 | 10.57 | 21.87 | 16.09 | 1.89 | 9.29 | 15.75 | 17.77 | 15.79 | MambaForCausalLM |
GloriaTA-3B | 4.09 | 1.89 | 3.2 | 5.19 | 0 | 2.32 | 0.26 | 0.28 | 23.52 | 0.19 | GPTNeoForCausalLM |
- Downloads last month
- 22
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.