Llama-3-Swallow-Infused-R1776-70B

Overview

Llama-3-Swallow-Infused-R1776-70B is a 70B parameter merged model built on Meta's Llama 3 architecture. This model combines the distilled reasoning performance of r1-1776-distill-llama-70b with enhanced instruction-following capabilities from the Swallow model, making it particularly effective for both English and Japanese instruction tasks.

The foundation of this model leverages perplexity-ai/r1-1776-distill-llama-70b, a distilled model fine-tuned for reasoning tasks on top of Llama 3.3. To boost Japanese language proficiency and overall instruction alignment, we incorporated the ChatVector from tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4. This approach - adding an instruction-tuned model’s ChatVector to a reasoning-centric model - represents an innovative strategy to enhance the model's multilingual reasoning capabilities.

Merge Methodology

This model was created using a weighted linear merge:

Llama-3-Swallow-Infused-R1776-70B =
  r1-1776-distill-llama-70b + 0.4 * (
    Swallow-70B-Instruct-v0.4 - Llama-3.3-70B-Instruct
  )

Base: perplexity-ai/r1-1776-distill-llama-70b
- A distilled reasoning-focused model built on Meta Llama 3.3.
Delta: Difference between tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 and meta-llama/Llama-3.3-70B-Instruct.
Merge Tool: MergeKit
Scaling Factor: α = 0.4

Before merging, we performed vocabulary alignment to ensure consistency between the merged components. This step uses yasu-oh/merge_tools to align the vocabulary of the added model with the tokenizer of the base model. This preprocessing step prevents token mismatches and preserves high-quality performance across merged models.

This methodology ensures that the reasoning backbone of R1776 is retained while integrating Swallow's enhancements in instruction tuning and Japanese language support.

Languages

English
Japanese

Key Features

Bilingual support: robust performance for both English and Japanese tasks.
Enhanced reasoning and instruction-following capabilities.
Novel use of ChatVector addition from instruction-tuned models to a reasoning-centric base.

Recommended Parameters

temperature: 0.6
top_p: 0.95
top_k: 40
min_p: 0.0

License

This model is distributed under the Meta Llama 3 Community License. Please review and comply with its terms: https://www.llama.com/llama3/license/

Key Restrictions Include:

Do not use this model to improve competing large language models (LLMs).
When reusing this model, include the phrase: "Built with Meta Llama 3."
Organizations with more than 700 million monthly active users (MAU) require a separate license from Meta.
Model names must include “Llama 3”.

Citations

If you use this model, please cite the original works:

yasu-oh
/

Llama-3-Swallow-Infused-R1776-70B