|
--- |
|
license: apache-2.0 |
|
language: |
|
- cy |
|
- en |
|
datasets: |
|
- yahma/alpaca-cleaned |
|
- allenai/MADLAD-400 |
|
--- |
|
![Mistral-7B-Cymraeg-Welsh](https://huggingface.co/BangorAI/Mistral-7B-Cymraeg-Welsh-v2/resolve/main/draig.jpeg) |
|
# Mistral-7B-Cymraeg-Welsh-v2 # |
|
|
|
This is a bilingual Mistral chat / instruct model trained in both English and Welsh languages. |
|
|
|
The model is based on [BangorAI/mistral-7b-cy-epoch-2](https://huggingface.co/BangorAI/mistral-7b-cy-epoch-2) which is a continual pre-training of the [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) model with Welsh data from the [allenai/MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) dataset for 2 epochs. |
|
|
|
The model was then fine-tuned using the [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) dataset in both Welsh and English languages, also for 2 epochs. |
|
|
|
## Demo ## |
|
|
|
An online demo of the model can be found at [https://demo.bangor.ai](https://demo.bangor.ai) |
|
|
|
It's an experimental LLM, so don't take any response from the model seriously or as factually correct. You are responsible for any output you generate. |
|
|
|
## Format ## |
|
|
|
The LLM uses the Llama-2 format for its prompts: |
|
``` |
|
<s>[INST] <<SYS>> |
|
{{ system_prompt }} |
|
<</SYS>> |
|
|
|
{{ user_message }} [/INST] |
|
``` |
|
|
|
The language of the system prompt will guide the LLM as to which language it should respond in. |
|
For example, in English: |
|
``` |
|
<s>[INST] <<SYS>> |
|
You are a helpful assistant that responds truthfully, logically and in detail. Answer in English. |
|
<</SYS>> |
|
|
|
{{ user_message }} [/INST] |
|
|
|
``` |
|
|
|
Similarily, for responses in Welsh: |
|
|
|
``` |
|
<s>[INST] <<SYS>> |
|
Rydych chi'n gynorthwydd cymwynasgar sy'n barod i ateb unrhyw gwestiwn yn ffyddlon. Ymatebwch i gwestiwn y defnyddiwr yn llawn a gyda ffeithiau cywir yn y Gymraeg. |
|
<</SYS>> |
|
|
|
{{ user_message }} [/INST] |
|
|
|
``` |
|
|
|
--- |
|
|
|
|
|
# Mistral-7B-Cymraeg-Welsh-v2 # |
|
|
|
Mae hwn yn fodel sgwrsio / cyfawryddo Mistral dwyieithog wedi'i hyfforddi yn y Gymraeg a'r Saesneg. |
|
|
|
Mae'r model yn seiliedig ar [BangorAI/mistral-7b-cy-epoch-2](https://huggingface.co/BangorAI/mistral-7b-cy-epoch-2) sy'n rhaghyfforddiant parhaus o fodel [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) gyda data [allenai/MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) ar gyfer 2 epoch. |
|
|
|
Cafodd y model hyfforddiant cywrian pellach gan ddefnyddio'r dataset [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) yn Gymraeg a Saesneg, hefyd am 2 epoch. |
|
|
|
## Demo Byw ## |
|
Mae fersiwn o'r model i'w gael yma am sgwrs: [https://demo.bangor.ai](https://demo.bangor.ai). |
|
|
|
LLM arbrofol ydyw, felly peidiwch a chymeryd unrhyw ymateb gan y model o ddifri. |
|
|
|
## Fformat Sgwrs ## |
|
|
|
Mae iaith y "system prompt" yn arwain yr LLM i ymateb yn y Gymraeg neu'r Saesneg. |
|
Er enghraifft, ar gyfer y Gymraeg: |
|
``` |
|
<s>[INST] <<SYS>> |
|
Rydych chi'n gynorthwydd cymwynasgar sy'n barod i ateb unrhyw gwestiwn yn ffyddlon. Ymatebwch i gwestiwn y defnyddiwr yn llawn a gyda ffeithiau cywir yn y Gymraeg. |
|
<</SYS>> |
|
|
|
{{ user_message }} [/INST] |
|
|
|
``` |
|
|
|
Yn yr un modd, ar gyfer atebion yn Saesneg: |
|
``` |
|
<s>[INST] <<SYS>> |
|
You are a helpful assistant that responds truthfully, logically and in detail. Answer in English. |
|
<</SYS>> |
|
|
|
{{ user_message }} [/INST] |
|
|
|
``` |
|
|
|
--- |
|
|
|
*Contains information from [allenai/MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) which is made available |
|
under the ODC Attribution License.* |
|
|