This is a custom huggingface model port of the PyTorch implementation of the original transformer model from 2017 introduced in the paper "Attention Is All You Need". This is the 65M parameter base model version trained to do English-to-German translations.

Usage:

model = AutoModel.from_pretrained("ubaada/original-transformer", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("ubaada/original-transformer")
text = 'This is my cat'
output = model.generate(**tokenizer(text, return_tensors="pt", add_special_tokens=True, truncation=True, max_length=100))
tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
# Output: ' Das ist meine Katze.'

(remember the trust_remote_code=True because of custom modeling file)

Training:

Parameter Value
Dataset WMT14-de-en
Translation Pairs 4.5M (135M tokens total)
Epochs 24
Batch Size 16
Accumulation Batch 8
Effective Batch Size 128 (16 * 8)
Training Script train.py
Optimiser Adam (learning rate = 0.0001)
Loss Type Cross Entropy
Final Test Loss 1.87
GPU. RTX 4070 (12GB)

Results

Downloads last month
21
Safetensors
Model size
63.1M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Dataset used to train ubaada/original-transformer