Mixtress 135M

Model Description

Mixtress 135M is a transformer model based upon the Mixtral architecture. It is the culmination of approximately 20 weeks of Kaggle free hours, and 67 twelve-hour training runs.

Training data

Mixtress was trained on a curated sampling of data from the following datasets:

  • allenai/c4
  • HuggingFaceFW/fineweb-edu
  • togethercomputer/RedPajama-Data-V2
  • Muennighoff/natural-instructions
  • databricks/databricks-dolly-15k
  • HuggingFaceTB/smollm-corpus
  • open-phi/textbooks
  • roneneldan/TinyStories

Training procedure

This model was trained for 2.15 billion tokens over 20,000 optimizer steps. It was trained as a masked autoregressive language model, using cross-entropy loss.

The final train loss was 1.941, validation loss was 2.206, and perplexity was 9.136.

Mixtress was pre-trained and fine-tuned simultaneously. Full reproduction code may be found at this URL, or in the Jupyter notebook in this repository.

Intended Use and Limitations

The model is best at what it was pretrained for, which is generating conversational text and answering questions from a prompt.

How to use

You can use this model directly with a pipeline for text generation. This example generates a different sequence each time it's run:

>>> from transformers import pipeline
>>> generator = pipeline('text-generation', model='UNSAFE/Mixtress-135M')
>>> generator("In a shocking finding, ", do_sample=True, temperature=0.7, min_length=50)

[{'generated_text': 'In a shocking finding, 20 years ago, U.S. President Donald Trump'}]

Eval results

All evaluations were done using the Pythia evaluation harness.

Scores

Model and Size ARC-easy ARC-challenge HellaSwag PiQA TinyMMLU TriviaQA Winogrande
EleutherAI/gpt-neo-125m 22.95% N/A 30.26% N/A N/A N/A N/A
HuggingFaceTB/SmolLM-135M 43.99% N/A 42.30% 69.60% 30.23% 4.11% 52.70%
OpenAI/GPT2-137M 31.09% N/A 29.76% 62.51% 26.29% 0.49% 49.72%
UNSAFE/Mixtress-135M 29.21% 24.57% 26.99% 52.67% 31.71% N/A 50.91%

Join Us

If you would like to chat with us, please join the Discord server!

Downloads last month
2
Safetensors
Model size
136M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Datasets used to train UNSAFE/Mixtress-135M