Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ This is a preview release of the Shisa V2 bilingual Japanese and English (JA/EN)
|
|
18 |
|
19 |
It is a fine tune of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) and inherits the tokenizer ([JA efficiency](https://github.com/shisa-ai/shisa-v2/blob/main/eval/tokenizer-efficiency/tokenizer-eval-ja.md)) and context length (128K).
|
20 |
|
21 |
-
While we're still working hard on this model (including integrating additional datasets and applying several more post-training stages) this model already shows a significant performace leap over our prior published models, including beating our Llama 3 70B tune from last year almost across the board on our evals. We're releasing this WIP preview to celebrate since we also just noticed that [shisa-gamma-7b-v1 hit 1 milion downloads]! 🥳 (OK, it's not [1 billion](https://about.fb.com/news/2025/03/celebrating-1-billion-downloads-llama/) but it's still nothing to sneeze at!)
|
22 |
|
23 |
## Evals
|
24 |
|
@@ -46,15 +46,15 @@ Not only is this our best model yet, from our testing, this model is also curren
|
|
46 |
|
47 |
Testing notes:
|
48 |
- JA functional tests are done with the [shisa-ai/shaberi](https://github.com/shisa-ai/shaberi/) fork using a [PoLL](https://arxiv.org/abs/2404.18796) of [Tulu 3 405B FP8](https://huggingface.co/shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic), [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), and [Athene-V2](https://huggingface.co/Nexusflow/Athene-V2-Chat) that was tested to be roughly comparable to `gpt-4-1106-preview` for scoring
|
49 |
-
- Gemini 2 9B models aren't tested atm due to lack of system prompt breaking multiple evals...
|
50 |
- Dynamic RoPE extension is used when necessary for testing models w/ a 4K context window
|
51 |
-
- Sarashina2.2-Instruct 3B included since they [claim to achieve 7-8B class performance](https://www.sbintuitions.co.jp/blog/entry/2025/03/07/093143) and I was curious if that panned out (it seems so!)
|
52 |
|
53 |
## Data
|
54 |
-
Our final release will have full details, but this currently model is largely based off of work done on [Shisa V1](https://huggingface.co/augmxnt/shisa-7b-v1), but refined, filtered, regenerated, annotated, rated, and selected. This is augmented by additional datasets focused on translation, multi-turn chat, role-play and other real-world tasks. All synthetic data was regenerated from open weight models. This model currently has only a single DPO stage using a placeholder (but surprisingly good!) [EN preference set](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm)
|
55 |
|
56 |
## Credits
|
57 |
-
Trained by [Shisa.AI](https://shisa.ai/): [Leonard Lin](https://huggingface.co/leonardlin)
|
58 |
|
59 |
Compute sponsored by <a href="https://ubitus.net/">Ubitus K.K.</a> and <a href="https://www.meti.go.jp/english/policy/mono_info_service/geniac/index.html">METI GENIAC</a>.
|
60 |
|
|
|
18 |
|
19 |
It is a fine tune of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) and inherits the tokenizer ([JA efficiency](https://github.com/shisa-ai/shisa-v2/blob/main/eval/tokenizer-efficiency/tokenizer-eval-ja.md)) and context length (128K).
|
20 |
|
21 |
+
While we're still working hard on this model (including integrating additional datasets and applying several more post-training stages) this model already shows a significant performace leap over our prior published models, including beating our Llama 3 70B tune from last year almost across the board on our evals. We're releasing this WIP preview to celebrate since we also just noticed that [shisa-gamma-7b-v1 hit 1 milion downloads](https://shisa.ai/posts/shisa-gamma-7b-v1-1-million-downloads/)! 🥳 (OK, it's not [1 billion](https://about.fb.com/news/2025/03/celebrating-1-billion-downloads-llama/) but it's still nothing to sneeze at!)
|
22 |
|
23 |
## Evals
|
24 |
|
|
|
46 |
|
47 |
Testing notes:
|
48 |
- JA functional tests are done with the [shisa-ai/shaberi](https://github.com/shisa-ai/shaberi/) fork using a [PoLL](https://arxiv.org/abs/2404.18796) of [Tulu 3 405B FP8](https://huggingface.co/shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic), [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), and [Athene-V2](https://huggingface.co/Nexusflow/Athene-V2-Chat) that was tested to be roughly comparable to `gpt-4-1106-preview` for scoring
|
49 |
+
- Gemini 2 9B models aren't tested atm due to lack of system prompt breaking multiple evals atm...
|
50 |
- Dynamic RoPE extension is used when necessary for testing models w/ a 4K context window
|
51 |
+
- Sarashina2.2-Instruct 3B included since they [claim to achieve 7-8B class performance](https://www.sbintuitions.co.jp/blog/entry/2025/03/07/093143) and I was curious if that panned out (it seems so, kicks but on the Shaberi functional tests!)
|
52 |
|
53 |
## Data
|
54 |
+
Our final release will have full details, but this currently model is largely based off of work done on [Shisa V1](https://huggingface.co/augmxnt/shisa-7b-v1), but refined, filtered, regenerated, annotated, rated, and selected. This is augmented by additional datasets focused on translation, multi-turn chat, role-play and other real-world tasks. All synthetic data was regenerated from open weight models. This model currently has only a single DPO stage using a placeholder (but surprisingly good!) [EN preference set](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) and a custom RP DPO mix.
|
55 |
|
56 |
## Credits
|
57 |
+
Trained by [Shisa.AI](https://shisa.ai/): [Leonard Lin](https://huggingface.co/leonardlin) and [Adam Lensenmayer](https://huggingface.co/NekoMikoReimu)
|
58 |
|
59 |
Compute sponsored by <a href="https://ubitus.net/">Ubitus K.K.</a> and <a href="https://www.meti.go.jp/english/policy/mono_info_service/geniac/index.html">METI GENIAC</a>.
|
60 |
|