Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
Inference Endpoints
Edit model card

MiniPLM-llama3.1-212M

paper | code

MiniPLM-llama3.1-212M is a 212M model with the LLaMA3.1 achitecture pre-trained from scratch on the Pile using the MiniPLM knowledge distillation framework with the offcial Qwen1.5-1.8B as the teacher model. This model shows the flexibility of the MiniPLM framework in conducting knowledge distillation across model families.

We also open-source the pre-training corpus refined by Difference Sampling in MiniPLM for reproducibility.

Evaluation

MiniPLM models achieves better performance given the same computation and scales well across model sizes:

Baseline Models

Citation

@article{miniplm,
    title={MiniPLM: Knowledge Distillation for Pre-Training Language Models}, 
    author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
    journal={arXiv preprint arXiv:2410.17215},
    year={2024}
}
Downloads last month
81
Safetensors
Model size
212M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for MiniLLM/MiniPLM-llama3.1-212M

Quantizations
1 model

Datasets used to train MiniLLM/MiniPLM-llama3.1-212M

Collection including MiniLLM/MiniPLM-llama3.1-212M