MiniLLM
/

Pretrain-Qwen-1.2B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Pretrain-Qwen-1.2B

Pretrain-Qwen-1.2B is a 1.2B model with Qwen achitecture conventionally pre-trained from scratch on the Pile for 50B tokens.

We also open-source the tokenized pre-training corpus for reproducibility.

It is used as the baseline for MiniLLM-Qwen-1.2B

Evaluation

MiniPLM models achieves better performance given the same computation and scales well across model sizes:

Other Baselines

VanillaKD

Citation

@article{miniplm,
    title={MiniPLM: Knowledge Distillation for Pre-Training Language Models}, 
    author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
    journal={arXiv preprint arXiv:2410.17215},
    year={2024}
}

Downloads last month: 35

Safetensors

Model size

1.16B params

Tensor type

F32

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train MiniLLM/Pretrain-Qwen-1.2B