mozilla-foundation/common_voice_13_0
Updated • 2.53k • 4
How to use indiejoseph/bart-base-cantonese with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("indiejoseph/bart-base-cantonese")
model = AutoModelForSeq2SeqLM.from_pretrained("indiejoseph/bart-base-cantonese")This model is a continue pre-train version of fnlp/bart-base-chinese on filtered Cantonese common crawl dataset with 472M tokens.
This tokenizer has extended the Bert tokenizer from fnlp/bart-base-chinese with 100 more Chinese characters commonly found in Cantonese
More information needed
More information needed
The following hyperparameters were used during training: