Spaces:
Runtime error
Runtime error
| library_name: transformers | |
| language: | |
| - yue | |
| license: cc-by-4.0 | |
| tags: | |
| - generated_from_trainer | |
| pipeline_tag: fill-mask | |
| widget: | |
| - text: 香港原本[MASK]一個人煙稀少嘅漁港。 | |
| example_title: 係 | |
| model-index: | |
| - name: bert-large-cantonese | |
| results: [] | |
| # bert-large-cantonese | |
| ## Description | |
| This model is tranied from scratch on Cantonese text. It is a BERT model with a large architecture (24-layer, 1024-hidden, 16-heads, 326M parameters). | |
| The first training stage is to pre-train the model on 128 length sequences with a batch size of 512 for 1 epoch. the second stage is to continued pre-train the model on 512 length sequences with a batch size of 512 for one more epoch. | |
| ## How to use | |
| You can use this model directly with a pipeline for masked language modeling: | |
| ```python | |
| from transformers import pipeline | |
| mask_filler = pipeline( | |
| "fill-mask", | |
| model="hon9kon9ize/bert-large-cantonese" | |
| ) | |
| mask_filler("雞蛋六隻,糖呢就兩茶匙,仲有[MASK]橙皮添。") | |
| ; [{'score': 0.08160534501075745, | |
| ; 'token': 943, | |
| ; 'token_str': '個', | |
| ; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 個 橙 皮 添 。'}, | |
| ; {'score': 0.06182105466723442, | |
| ; 'token': 1576, | |
| ; 'token_str': '啲', | |
| ; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 啲 橙 皮 添 。'}, | |
| ; {'score': 0.04600336775183678, | |
| ; 'token': 1646, | |
| ; 'token_str': '嘅', | |
| ; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 嘅 橙 皮 添 。'}, | |
| ; {'score': 0.03743772581219673, | |
| ; 'token': 3581, | |
| ; 'token_str': '橙', | |
| ; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 橙 橙 皮 添 。'}, | |
| ; {'score': 0.031560592353343964, | |
| ; 'token': 5148, | |
| ; 'token_str': '紅', | |
| ; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 紅 橙 皮 添 。'}] | |
| ``` | |
| ## Training hyperparameters | |
| The following hyperparameters were used during first training: | |
| - Batch size: 512 | |
| - Learning rate: 1e-4 | |
| - Learning rate scheduler: linear decay | |
| - 1 Epoch | |
| - Warmup ratio: 0.1 | |
| Loss plot on [WanDB](https://api.wandb.ai/links/indiejoseph/v3ljlpmp) | |
| The following hyperparameters were used during second training: | |
| - Batch size: 512 | |
| - Learning rate: 5e-5 | |
| - Learning rate scheduler: linear decay | |
| - 1 Epoch | |
| - Warmup ratio: 0.1 | |
| Loss plot on [WanDB](https://api.wandb.ai/links/indiejoseph/vcm3q1ef) | |