Doge Face
AI & ML interests
A Family of Dynamic Ultra-Fast Small Language Models Ready for Embodied Artificial General Intelligence!
Recent Activity
SmallDoge
This is the home of the SmallDoge family of small language models, where we release a series of high-quality and ultra-fast small language models based on dynamic algorithms. All training details and code are publicly available on the small-doge repository. We have released:
- Doge-SLM: A series of small language models, including pre-training models, supervised fine-tuning models, and reinforcement learning models.
- Doge-CheckPoint: A series of checkPoint weights that can continue training on new datasets without spikes of the training.
- Doge-Downstream-Applications: A series of small language models for downstream applications.
As shown in the figure below, the sequence transformation part of the Doge architecture uses Dynamic Mask Attention
, which can be understood as using self-attention related to value states during training, and using state-space without past state decay during inference, to solve the problem of existing Transformers or SSMs getting lost in long text. The state transformation part of Doge uses Cross Domain Mixture of Experts
, which consists of dense linear layers and sparse embedding layers, and can additionally increase sparse parameters to continue training from dense weight checkpoints without retraining the entire model, thereby reducing the cost of continuous iteration of the model. In addition, Doge also uses RMSNorm
and Residual
with learnable parameters to adapt the gradient range of deep models.