Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,10 @@ datasets:
|
|
10 |
- princeton-nlp/gemma2-ultrafeedback-armorm
|
11 |
---
|
12 |
|
|
|
|
|
|
|
|
|
13 |
*Per the Llama 3.1 Community License Agreement, the official name of this model is "Llama 3.1 shisa-v2-llama3.1-8b-preview"*
|
14 |
|
15 |
# shisa-v2-llama-3.1-8b-preview
|
@@ -48,7 +52,7 @@ Testing notes:
|
|
48 |
- JA functional tests are done with the [shisa-ai/shaberi](https://github.com/shisa-ai/shaberi/) fork using a [PoLL](https://arxiv.org/abs/2404.18796) of [Tulu 3 405B FP8](https://huggingface.co/shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic), [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), and [Athene-V2](https://huggingface.co/Nexusflow/Athene-V2-Chat) that was tested to be roughly comparable to `gpt-4-1106-preview` for scoring
|
49 |
- Gemini 2 9B models aren't tested atm due to lack of system prompt breaking multiple evals atm...
|
50 |
- Dynamic RoPE extension is used when necessary for testing models w/ a 4K context window
|
51 |
-
- Sarashina2.2-Instruct 3B included since they [claim to achieve 7-8B class performance](https://www.sbintuitions.co.jp/blog/entry/2025/03/07/093143) and I was curious if that panned out (it seems so, kicks
|
52 |
|
53 |
## Data
|
54 |
Our final release will have full details, but this currently model is largely based off of work done on [Shisa V1](https://huggingface.co/augmxnt/shisa-7b-v1), but refined, filtered, regenerated, annotated, rated, and selected. This is augmented by additional datasets focused on translation, multi-turn chat, role-play and other real-world tasks. All synthetic data was regenerated from open weight models. This model currently has only a single DPO stage using a placeholder (but surprisingly good!) [EN preference set](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) and a custom RP DPO mix.
|
|
|
10 |
- princeton-nlp/gemma2-ultrafeedback-armorm
|
11 |
---
|
12 |
|
13 |
+
**UPDATE:** We have since released the full [Shisa V2](https://huggingface.co/collections/shisa-ai/shisa-v2-67fc98ecaf940ad6c49f5689) family of models. See our announcement at [https://shisa.ai/posts/shisa-v2/](https://shisa.ai/posts/shisa-v2/)
|
14 |
+
|
15 |
+
---
|
16 |
+
|
17 |
*Per the Llama 3.1 Community License Agreement, the official name of this model is "Llama 3.1 shisa-v2-llama3.1-8b-preview"*
|
18 |
|
19 |
# shisa-v2-llama-3.1-8b-preview
|
|
|
52 |
- JA functional tests are done with the [shisa-ai/shaberi](https://github.com/shisa-ai/shaberi/) fork using a [PoLL](https://arxiv.org/abs/2404.18796) of [Tulu 3 405B FP8](https://huggingface.co/shisa-ai/Llama-3.1-Tulu-3-405B-FP8-Dynamic), [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), and [Athene-V2](https://huggingface.co/Nexusflow/Athene-V2-Chat) that was tested to be roughly comparable to `gpt-4-1106-preview` for scoring
|
53 |
- Gemini 2 9B models aren't tested atm due to lack of system prompt breaking multiple evals atm...
|
54 |
- Dynamic RoPE extension is used when necessary for testing models w/ a 4K context window
|
55 |
+
- Sarashina2.2-Instruct 3B included since they [claim to achieve 7-8B class performance](https://www.sbintuitions.co.jp/blog/entry/2025/03/07/093143) and I was curious if that panned out (it seems so, kicks butt on the Shaberi functional tests)
|
56 |
|
57 |
## Data
|
58 |
Our final release will have full details, but this currently model is largely based off of work done on [Shisa V1](https://huggingface.co/augmxnt/shisa-7b-v1), but refined, filtered, regenerated, annotated, rated, and selected. This is augmented by additional datasets focused on translation, multi-turn chat, role-play and other real-world tasks. All synthetic data was regenerated from open weight models. This model currently has only a single DPO stage using a placeholder (but surprisingly good!) [EN preference set](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) and a custom RP DPO mix.
|