mpt-125m-c4

Model Description

Pretrained model for MPT-125M trained on C4 dataset

Training data

Trained on HuggingFace C4 dataset

Training procedure

This model was trained on C4 for ~2.5B tokens. Training time was ~1 hour with 104 A100-40gb GPUs.

Intended Use and Limitations

This model is primarily for generating texts from a prompt. The purpose is to explore pretraining models for research.

Downloads last month
1,871
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Model tree for wtang06/mpt-125m-c4

Quantizations
1 model

Dataset used to train wtang06/mpt-125m-c4