AI & ML interests

A Family of Dynamic Ultra-Fast Small Language Models Ready for Embodied Artificial General Intelligence!

Recent Activity

JingzeShi  updated a dataset about 1 hour ago
SmallDoge/pt_dataset_processed
JingzeShi  published a dataset about 1 hour ago
SmallDoge/pt_dataset_processed
JingzeShi  updated a model 2 days ago
SmallDoge/Doge-60M-checkpoint
View all activity

Doge

SmallDoge

This is the home of the SmallDoge family of small language models, where we release a series of high-quality and ultra-fast small language models based on dynamic algorithms. All training details and code are publicly available on the small-doge repository. We have released:

  • Doge-SLM: A series of small language models, including pre-training models, supervised fine-tuning models, and reinforcement learning models.
  • Doge-CheckPoint: A series of checkPoint weights that can continue training on new datasets without spikes of the training.
  • Doge-Downstream-Applications: A series of small language models for downstream applications.
drawing

As shown in the figure below, the sequence transformation part of the Doge architecture uses Dynamic Mask Attention, which can be understood as using self-attention related to value states during training, and using state-space without past state decay during inference, to solve the problem of existing Transformers or SSMs getting lost in long text. The state transformation part of Doge uses Cross Domain Mixture of Experts, which consists of dense linear layers and sparse embedding layers, and can additionally increase sparse parameters to continue training from dense weight checkpoints without retraining the entire model, thereby reducing the cost of continuous iteration of the model. In addition, Doge also uses RMSNorm and Residual with learnable parameters to adapt the gradient range of deep models.