AI & ML interests
None defined yet.
Recent Activity
View all activity
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
-
tokyotech-llm/swallow-code
Viewer • Updated • 129M • 3.04k • 52 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500
8B • Updated • 4 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0005000
8B • Updated • 4 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0007500
8B • Updated • 5
-
tokyotech-llm/Llama-3-Swallow-8B-v0.1
Text Generation • 8B • Updated • 1.08k • • 11 -
tokyotech-llm/Llama-3-Swallow-70B-v0.1
Text Generation • 71B • Updated • 150 • • 5 -
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
Text Generation • 8B • Updated • 10.9k • • 20 -
tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1
Text Generation • 71B • Updated • 340 • • 7
Swallow instruction tuning models
-
tokyotech-llm/Swallow-13b-instruct-v0.1
Text Generation • 13B • Updated • 33 • 1 -
tokyotech-llm/Swallow-7b-instruct-v0.1
Text Generation • 7B • Updated • 979 • 3 -
tokyotech-llm/Swallow-70b-instruct-v0.1
Text Generation • 69B • Updated • 9 -
tokyotech-llm/Swallow-70b-NVE-instruct-hf
Text Generation • 69B • Updated • 1.23k • 2
Swallow MX(Mixtral) models
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
-
tokyotech-llm/swallow-math
Viewer • Updated • 4.33M • 1.3k • 31 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0002500
8B • Updated • 9 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0005000
8B • Updated • 8 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0007500
8B • Updated • 7
-
tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1
Text Generation • 27B • Updated • 45 -
tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1
Text Generation • 9B • Updated • 126 • 1 -
tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1
Text Generation • 3B • Updated • 597 -
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1
Text Generation • 3B • Updated • 1.17k • 2
-
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5
Text Generation • 8B • Updated • 9.53k • • 10 -
tokyotech-llm/Llama-3.1-Swallow-8B-v0.5
8B • Updated • 10.6k • 5 -
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3
Text Generation • 71B • Updated • 1.01k • 13 -
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3
Text Generation • 8B • Updated • 4.22k • • 21
Continual Pre-Training from Llama 2
Swallow MS(Mistral) models
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
-
tokyotech-llm/swallow-math
Viewer • Updated • 4.33M • 1.3k • 31 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0002500
8B • Updated • 9 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0005000
8B • Updated • 8 -
tokyotech-llm/Llama-3.1-8B-math-ablation-exp2-LR2.5e-5-WD0.1-iter0007500
8B • Updated • 7
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
-
tokyotech-llm/swallow-code
Viewer • Updated • 129M • 3.04k • 52 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0002500
8B • Updated • 4 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0005000
8B • Updated • 4 -
tokyotech-llm/Llama-3.1-8B-code-ablation-exp1-LR2.5e-5-MINLR2.5E-6-WD0.1-iter0007500
8B • Updated • 5
-
tokyotech-llm/Gemma-2-Llama-Swallow-27b-pt-v0.1
Text Generation • 27B • Updated • 45 -
tokyotech-llm/Gemma-2-Llama-Swallow-9b-pt-v0.1
Text Generation • 9B • Updated • 126 • 1 -
tokyotech-llm/Gemma-2-Llama-Swallow-2b-pt-v0.1
Text Generation • 3B • Updated • 597 -
tokyotech-llm/Gemma-2-Llama-Swallow-2b-it-v0.1
Text Generation • 3B • Updated • 1.17k • 2
-
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5
Text Generation • 8B • Updated • 9.53k • • 10 -
tokyotech-llm/Llama-3.1-Swallow-8B-v0.5
8B • Updated • 10.6k • 5 -
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3
Text Generation • 71B • Updated • 1.01k • 13 -
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3
Text Generation • 8B • Updated • 4.22k • • 21
-
tokyotech-llm/Llama-3-Swallow-8B-v0.1
Text Generation • 8B • Updated • 1.08k • • 11 -
tokyotech-llm/Llama-3-Swallow-70B-v0.1
Text Generation • 71B • Updated • 150 • • 5 -
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
Text Generation • 8B • Updated • 10.9k • • 20 -
tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1
Text Generation • 71B • Updated • 340 • • 7
Continual Pre-Training from Llama 2
Swallow instruction tuning models
-
tokyotech-llm/Swallow-13b-instruct-v0.1
Text Generation • 13B • Updated • 33 • 1 -
tokyotech-llm/Swallow-7b-instruct-v0.1
Text Generation • 7B • Updated • 979 • 3 -
tokyotech-llm/Swallow-70b-instruct-v0.1
Text Generation • 69B • Updated • 9 -
tokyotech-llm/Swallow-70b-NVE-instruct-hf
Text Generation • 69B • Updated • 1.23k • 2
Swallow MS(Mistral) models
Swallow MX(Mixtral) models