Add paper link to model card (#5)

bc1650c verified 3 days ago

9.88 kB

	---
	language:
	- en
	- ko
	- ja
	- zh
	library_name: transformers
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- finetuned
	- chat
	---

	# Trillion-7B-preview

	<p align="center">
	<picture>
	<img src="assets/Signiture_Black_White_BG_resized.jpg" alt="logo", width="300", style="margin: 40 auto;">
	</picture>


	## Introduction

	We introduce Trillion-7B-preview, a preview of our latest large language model designed to push the boundaries of multilingual scalability and performance. This model is presented in the paper: [Trillion-7B-preview](https://huggingface.co/papers/2504.15431).


	When comparing performance to training FLOPs for Trillion-7B-preview with competitive models, our model pushes the Pareto frontier, achieving around 66.5% average performance while using significantly fewer compute (~9.3×10²² FLOPs). It outperforms models like Mistral-7B-Instruct-v0.3 and SOLAR-10.7B-Instruct-v1.0 while remaining competitive with models requiring 3-8× more compute such as Qwen2.5-7B-Instruct and EXAONE-3.5-7.8B-Instruct. For full benchmark results, see tables below.

	<p align="center">
	<img src="assets/frontier.png" alt="Average Performance vs. Approximate Training FLOPs" width="700">
	</p>

	- Type: Causal Language Model
	- Training Stage: Pre-training & Post-training
	- Architecture: Transformer Decoder with RoPE, SwiGLU, RMSNorm
	- Number of Parameters: 7.76B
	- Number of Layers: 32
	- Number of Attention Heads: 32
	- Context Length: 4,096
	- Number of Tokens seen: 2T
	- Vocab Size: 128,128


	## Quickstart

	Here is a code snippet with `apply_chat_template` that demonstrates how to load the tokenizer and model and generate text.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "trillionlabs/Trillion-7B-preview"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Tell me a hilarious knock knock joke."
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	model_inputs["input_ids"],
	attention_mask=model_inputs["attention_mask"],
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)

	"""
	Sure! Here's a classic knock-knock joke that's guaranteed to make you chuckle:
	Knock, knock.
	Who's there?
	Lettuce.
	Lettuce who?
	Lettuce in, it's too cold out here!
	"""
	```

	We also support vLLM integration.
	```bash
	vllm serve trillionlabs/Trillion-7B-preview --max-model-len 4096
	```

	## Evaluation

	We select a wide variety of benchmarks that evaluate general reasoning, knowledge recall, coding abilities, mathematical reasoning, and instruction following capabilities. We evaluated Trillion-7B-preview along with several leading large language models of similar size. Our model especially demonstrates strong performance on Korean benchmarks.


	<details>
	<summary> Full evaluation settings </summary>

	\| Benchmark \| Language \| Evaluation Setting \| Metric \|
	\|:----------\|:---------\|:------------------\|:-------\|
	\| General Reasoning and Reading Comprehension \| \| \| \|
	\| • HellaSwag \| English \| 0-shot \| accuracy \|
	\| • TruthfulQA_mc1 \| English \| 6-shot \| accuracy \|
	\| • TruthfulQA_mc2 \| English \| 6-shot \| accuracy \|
	\| • ARC:C \| English \| 0-shot \| accuracy \|
	\| • HAERAE \| Korean \| 3-shot \| accuracy \|
	\| • KoBEST \| Korean \| 5-shot \| accuracy \|
	\| • BBH \| English \| 0-shot, CoT \| accuracy \|
	\| • xwinograd_en \| English \| 0-shot \| accuracy \|
	\| • xwinograd_jp \| Japanese \| 0-shot \| accuracy \|
	\| • xwinograd_zh \| Chinese \| 0-shot \| accuracy \|
	\| Knowledge Recall \| \| \| \|
	\| • KMMLU \| Korean \| 5-shot \| accuracy \|
	\| • MMLU \| English \| 5-shot \| accuracy \|
	\| • Global-MMLU-Lite-en \| English \| 5-shot \| accuracy \|
	\| • Global-MMLU-Lite-ko \| Korean \| 5-shot \| accuracy \|
	\| • Global-MMLU-Lite-ja \| Japanese \| 5-shot \| accuracy \|
	\| • Global-MMLU-Lite-zh \| Chinese \| 5-shot \| accuracy \|
	\| Coding \| \| \| \|
	\| • HumanEval \| English \| 0-shot, CoT \| pass@1 \|
	\| • MBPP \| English \| 0-shot, CoT\| pass@1 \|
	\| Mathematical Reasoning \| \| \| \|
	\| • GSM8k \| English \| 0-shot, CoT \| exact-match \|
	\| • MATH \| English \| 0-shot, CoT \| exact-match \|
	\| • GPQA \| English \| 4-shot \| accuracy \|
	\| • HRM8k \| Korean \| 0-shot, CoT \| exact-match \|
	\| Instruction Following and Chat \| \| \| \|
	\| • IFEval \| English \| 0-shot \| strict-average \|
	\| • koIFEval* \| Korean \| 0-shot \| strict-average \|
	\| • MT-Bench** \| English \| LLM-as-a-judge (gpt-4o-2024-08-06) \| LLM score \|
	\| • KO-MT-Bench** \| Korean \| LLM-as-a-judge (gpt-4o-2024-08-06) \| LLM score \|
	\| • LogicKor** \| Korean \| LLM-as-a-judge (gpt-4o-2024-08-06) \| LLM score \|

	- *Note that koIFEval is our in-house evaluation benchmark for assessing instruction-following capabilities in Korean.
	- **Note that MT-Bench, KO-MT-Bench, and LogicKor use a 10-point scale.

	</details>

	### Benchmark Results

	- Trillion-7B-preview
	- [LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct)
	- [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
	- [meta-llama/Llama-3.1-8B-Instruct](meta-llama/Llama-3.1-8B-Instruct)
	- [Qwen/Qwen2.5-7B-Instruct](Qwen/Qwen2.5-7B-Instruct)
	- [upstage/SOLAR-10.7B-Instruct-v1.0](upstage/SOLAR-10.7B-Instruct-v1.0)
	- [mistralai/Mistral-7B-Instruct-v0.3](mistralai/Mistral-7B-Instruct-v0.3)


	### General Reasoning and Factuality

	\| Benchmark \| Trillion-7B-preview \| EXAONE-3.5-7.8B-Instruct \| gemma-2-9b-it \| Llama-3.1-8B-Instruct \| Qwen2.5-7B-Instruct \| SOLAR-10.7B-Instruct-v1.0 \| Mistral-7B-Instruct-v0.3 \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| HellaSwag \| 58.94 \| 60.04 \| 59.72 \| 59.81 \| 61.97 \| 68.72 \| 65.79 \|
	\| TruthfulQA_mc1 \| 36.10 \| 40.64 \| 42.96 \| 38.07 \| 47.74 \| 56.18 \| 42.47 \|
	\| TruthfulQA_mc2 \| 54.10 \| 59.74 \| 60.09 \| 54.54 \| 64.72 \| 70.64 \| 59.41 \|
	\| ARC:C \| 54.44 \| 56.40 \| 62.97 \| 53.58 \| 52.99 \| 60.07 \| 58.11 \|
	\| HAERAE \| 80.02 \| 76.08 \| 68.01 \| 63.15 \| 65.17 \| 60.86 \| 47.75 \|
	\| KoBEST \| 79.61 \| 78.57 \| 79.98 \| 70.09 \| 79.24 \| 75.20 \| 66.50 \|
	\| KMMLU \| 48.09 \| 45.39 \| 46.66 \| 41.41 \| 50.15 \| 41.66 \| 33.59 \|
	\| MMLU \| 63.52 \| 65.65 \| 72.24 \| 68.32 \| 74.23 \| 65.20 \| 61.84 \|
	\| Global-MMLU-Lite-en \| 67.75 \| 69.50 \| 76.25 \| 67.50 \| 77.25 \| 71.75 \| 65.50 \|
	\| Global-MMLU-Lite-ko \| 60.75 \| 60.00 \| 64.25 \| 54.00 \| 59.25 \| 53.75 \| 43.00 \|
	\| Global-MMLU-Lite-ja \| 60.75 \| 45.75 \| 66.50 \| 54.50 \| 65.75 \| 50.75 \| 50.00 \|
	\| Global-MMLU-Lite-zh \| 59.50 \| 50.00 \| 63.75 \| 60.25 \| 68.75 \| 57.00 \| 47.25 \|
	\| BBH \| 41.94 \| 53.30 \| 28.77 \| 43.16 \| 53.68 \| 52.91 \| 45.09 \|
	\| xwinograd_en \| 87.78 \| 87.10 \| 89.55 \| 88.09 \| 85.63 \| 87.35 \| 88.39 \|
	\| xwinograd_jp \| 79.98 \| 74.45 \| 80.92 \| 76.02 \| 72.89 \| 72.58 \| 70.70 \|
	\| xwinograd_zh \| 73.81 \| 69.44 \| 68.06 \| 76.19 \| 81.55 \| 74.60 \| 71.83 \|

	### Coding

	\| Benchmark \| Trillion-7B-preview \| EXAONE-3.5-7.8B-Instruct \| gemma-2-9b-it \| Llama-3.1-8B-Instruct \| Qwen2.5-7B-Instruct \| SOLAR-10.7B-Instruct-v1.0 \| Mistral-7B-Instruct-v0.3 \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| HumanEval \| 55.48 \| 79.26 \| 60.98 \| 67.68 \| 81.71 \| 34.76 \| 36.59 \|
	\| MBPP \| 40.40 \| 61.40 \| 8.40 \| 39.20 \| 51.00 \| 29.40 \| 36.00 \|

	### Mathematical Reasoning

	\| Benchmark \| Trillion-7B-preview \| EXAONE-3.5-7.8B-Instruct \| gemma-2-9b-it \| Llama-3.1-8B-Instruct \| Qwen2.5-7B-Instruct \| SOLAR-10.7B-Instruct-v1.0 \| Mistral-7B-Instruct-v0.3 \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| GSM8k \| 72.25 \| 87.79 \| 73.69 \| 74.98 \| 88.86 \| 62.93 \| 35.94 \|
	\| MATH \| 32.70 \| 70.68 \| - \| 38.30 \| 71.50 \| 14.38 \| 12.12 \|
	\| GPQA \| 32.81 \| 38.61 \| 36.83 \| 30.58 \| 34.15 \| 28.35 \| 32.59 \|
	\| HRM8k \| 30.10 \| 38.99 \| 16.04 \| - \| 41.51 \| 20.68 \| 7.89 \|

	### Instruction Following and Chat

	\| Benchmark \| Trillion-7B-preview \| EXAONE-3.5-7.8B-Instruct \| gemma-2-9b-it \| Llama-3.1-8B-Instruct \| Qwen2.5-7B-Instruct \| SOLAR-10.7B-Instruct-v1.0 \| Mistral-7B-Instruct-v0.3 \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| IFEval \| 79.13 \| 81.42 \| 75.48 \| 74.93 \| 75.85 \| 51.61 \| 52.64 \|
	\| koIFEval \| 66.58 \| 54.65 \| 43.30 \| 36.07 \| 48.55 \| 26.12 \| 34.22 \|
	\| MT-Bench \| 7.00 \| 8.15 \| 7.81 \| 6.32 \| 7.86 \| 6.76 \| 6.84 \|
	\| KO-MT-Bench \| 6.27 \| 8.13 \| 7.01 \| 4.27 \| 6.31 \| 2.89 \| 4.07 \|
	\| LogicKor \| 8.14 \| 9.25 \| 8.33 \| 6.45 \| 7.99 \| 1.85 \| 4.76




	## Limitations

	- Language Support: The model is optimized for English, Korean, Japanese, and Chinese. Usage with other languages may result in degraded performance.
	- Knowledge Cutoff: The model's information is limited to data available up to August 2023.
	- Safety Mechanisms: This release does not yet include comprehensive safety features. Future updates will address this area.
	- Release Status: This is a preliminary release version with planned enhancements and updates forthcoming.


	## License
	This model repository is licensed under the Apache-2.0 License.


	## Citation
	```
	@article{trillion7Bpreview,
	title={Trillion-7B-preview},
	author={trillionlabs},
	year={2025},
	url={https://huggingface.co/trillionlabs/Trillion-7B-preview}
	}
	```

	```
	@misc{han2025trillion7btechnicalreport,
	title={Trillion 7B Technical Report},
	author={Sungjun Han and Juyoung Suk and Suyeong An and Hyungguk Kim and Kyuseok Kim and Wonsuk Yang and Seungtaek Choi and Jamin Shin},
	year={2025},
	eprint={2504.15431},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2504.15431},
	}
	```
	## Contact
	For inquiries, please contact: [email protected]