--- license: apache-2.0 datasets: - shisa-ai/shisa-v2-sharegpt - shisa-ai/shisa-v2-405b-ultrafeedback-armorm language: - ja - en base_model: - Qwen/Qwen3-8B --- This is a WIP version of Qwen3 8B post-trained on the full Shisa V2 recipe. This is a *non-reasoning* model and thinking has been disabled in the default `chat_template`. This will be replaced shortly by a V2.1, but preliminary benchmarks suggest that it is quite strong. Shaberi (judged by GPT-4.1): | Model | Average | ELYZA 100 | JA-MT | Rakuda | Tengu | |--------------------------------------|---------|-----------|-------|--------|--------| | 017-qwen3-8b-v2-dpo405b-clr-nothink | **7.75** | **7.88** | **8.08** | **8.08** | **6.94** | | shisa-ai/shisa-v2-llama3.1-8b | 7.14 | 7.54 | 6.83 | 7.85 | 6.34 | | shisa-ai/shisa-v2-qwen2.5-7b | 7.10 | 7.48 | 7.40 | 7.18 | 6.33 | And JA MT-Bench (judged by GPT-4.1): | Model | coding | extraction | humanities | math | reasoning | roleplay | stem | writing | Overall | |--------------------------------------|--------|------------|------------|------|-----------|----------|------|---------|---------| | 017-qwen3-8b-v2-dpo405b-clr-nothink | **7.3** | **7.55** | **8.85** | **9.3** | **6.05** | **7.9** | **8.6** | **8.9** | **8.06** | | shisa-ai/shisa-v2-qwen2.5-7b | 6.7 | 7.15 | 7.55 | 8.5 | 5.4 | **7.9** | 7.5 | 7.7 | 7.3 | | shisa-ai/shisa-v2-llama3.1-8b | 5.3 | 6.95 | 8.4 | 6.55 | 5.95 | 7.65 | 7.25 | 7.9 | 6.99 |