Mixtress 135M

Model Description

Mixtress 135M is a transformer model based upon the Mixtral architecture. It is the culmination of approximately 20 weeks of Kaggle free hours, and 67 twelve-hour training runs.

Training data

Mixtress was trained on a curated sampling of data from the following datasets:

allenai/c4
HuggingFaceFW/fineweb-edu
togethercomputer/RedPajama-Data-V2
Muennighoff/natural-instructions
databricks/databricks-dolly-15k
HuggingFaceTB/smollm-corpus
open-phi/textbooks
roneneldan/TinyStories

Training procedure

This model was trained for 2.15 billion tokens over 20,000 optimizer steps. It was trained as a masked autoregressive language model, using cross-entropy loss.

The final train loss was 1.941, validation loss was 2.206, and perplexity was 9.136.

Mixtress was pre-trained and fine-tuned simultaneously. Full reproduction code may be found at this URL, or in the Jupyter notebook in this repository.

Intended Use and Limitations

The model is best at what it was pretrained for, which is generating conversational text and answering questions from a prompt.

How to use

You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run:

>>> from transformers import pipeline
>>> generator = pipeline('text-generation', model='UNSAFE/Mixtress-135M')
>>> generator("In a shocking finding, ", do_sample=True, temperature=0.7, min_length=50)

[{'generated_text': 'In a shocking finding, 20 years ago, U.S. President Donald Trump'}]

Eval results

All evaluations were done using the Pythia evaluation harness.

Scores

Model and Size	ARC-easy	ARC-challenge	HellaSwag	PiQA	TinyMMLU	TriviaQA	Winogrande
EleutherAI/gpt-neo-125m	22.95%	N/A	30.26%	N/A	N/A	N/A	N/A
HuggingFaceTB/SmolLM-135M	43.99%	N/A	42.30%	69.60%	30.23%	4.11%	52.70%
OpenAI/GPT2-137M	31.09%	N/A	29.76%	62.51%	26.29%	0.49%	49.72%
UNSAFE/Mixtress-135M	29.21%	24.57%	26.99%	52.67%	31.71%	N/A	50.91%

Join Us

If you would like to chat with us, please join the Discord server!

UNSAFE
/

Mixtress-135M