Model Card for ru-rope-t5-small-instruct

The Russian Rotary Position Embedding T5 model of small version after instruct tuning

Model Details

The model was trained in a Russian corpus with a mix of English using the Mixture-Of-Denoisers pre-training method by UL2 on 1024 length sequences. Training using Flash Attention 2 is available because of the replacement of bias with rotary encoding.

  • Model type: RoPE T5
  • Language(s) (NLP): Russian, English

Uses

Finetuning for downstream tasks

Bias, Risks, and Limitations

Despite the instructional tuning, it is not recommended to use in zero-shot mode due to the small size

Training Details

Training Data

A corpus of Russian texts from Vikhr filtered by FRED-T5-1.7B perplexy. Instructions are translated English set

Training Procedure

Using AdamWScale instead of Adafactor for stable learning without loss explosions

Metrics

rsg

Model Card Contact

@TheMelmoth

Downloads last month
19
Safetensors
Model size
77.1M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train melmoth/ru-rope-t5-small-instruct