metadata
library_name: transformers
license: other
language:
- ja
🐟 EvoLLM-JP-v1-7B
🤗 Models | 📚 Paper | 📝 Blog | 🐦 Twitter
EvoLLM-JP-v1-7B is a Japanese Math LLM by Evolutionary Model Merge.
Model Details
Model Description
EvoLLM-JP-v1-7B is a Japanese Math LLM, merged the following source models in the Parameter Space (PS) by Evolutionary Model Merge.
- Developed by: Sakana AI
- Model type: Autoregressive Language Model
- Language(s): Japanese
- License: MICROSOFT RESEARCH LICENSE TERMS
- Source models:
Model Sources
- Repository: SakanaAI/evolutionary-model-merge
- Paper: TODO
- Blog: TODO
Usage
Use the code below to get started with the model.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# 1. load model
device = "cuda" if torch.cuda.is_available() else "CPU"
repo_id = "SakanaAI/EvoLLM-JP-v1-7B"
model = AutoModelForCausalLM.from_pretrained(repo_id, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model.to(device)
# 2. prepare inputs
template = """以下に、あるタスクを説明する指示があります。リクエストを適切に完了するための回答を日本語で記述してください。一歩一歩考えましょう。
### 指示:
{input}
### 応答:"""
text = "ミシュカは半ズボンを3本、長ズボンを3本、靴を3足買いました。半ズボンは1本$16.50でした。長ズボンは1本$22.50で、靴は1足$42でした。すべての衣類にいくら使いましたか?"
inputs = tokenizer(template.format(input=text), return_tensors="pt")
# 3. generate
output_ids = model.generate(**inputs.to(device))
output_ids = output_ids[:, inputs.input_ids.shape[1] :]
generated_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
print(generated_text)
Evaluation
For details on the evaluation, please refer to Section 4.1 of the paper.
If you want to reproduce the results, please see our Github repository.
Id. | Model | Type | Params | MGSM-JA (acc ↑ ) |
---|---|---|---|---|
1 | Shisa Gamma 7B v1 | JA general | 7B | 9.6 |
2 | WizardMath 7B V1.1 | EN math | 7B | 18.4 |
3 | Abel 7B 002 | EN math | 7B | 30.0 |
4 | Arithmo2 Mistral 7B | EN math | 7B | 24.0 |
5 | EvoLLM-JP-v1-7B | 1+2+3 | 7B | 52.0 |
6 | EvoLLM-JP-A-v1-7B | 1+3+4 | 7B | 52.4 |
7 | EvoLLM-JP-v1-10B | 1 + 5 | 10B | 55.6 |
Acknowledgement
We would like to thank the developers of the source models for their contributions and for making their work available.
Citation
@misc{sakana2024evofactory,
title = {Evolutionary Optimization of Model Merging Recipes},
author. = {Takuya Akiba and Makoto Shing and Yujin Tang and Qi Sun and David Ha},
year = {2024},
eprint = {TODO},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}