Qwen3-72B-Synthesis

This still doesn't work, I'm trying to fix it.

A Qwen3-Architecture 72B Model Forged from Qwen3-32B and Qwen2.5-72B-Instruct.

Model Description

Qwen3-72B-Synthesis is an experimental, 80-layer, 72-billion-parameter large language model. It represents a novel approach to model creation, designed to produce a model with the pure, modern Qwen3 architecture while inheriting the vast, high-quality knowledge of the 72B-scale Qwen2.5-Instruct model.

This was not a simple merge. It was a multi-phase surgical procedure involving dimensional up-scaling, architectural alignment, and a strategic "knowledge transplant" using MergeKit. The result is a unique checkpoint that serves as an ideal starting point for further fine-tuning.

The core philosophy was to use Qwen/Qwen3-32B as the architectural "foundation" and Qwen/Qwen2.5-72B-Instruct as the "knowledge donor."

Model Details

Architecture: Qwen3 (RMSNorm, SwiGLU, no biases, includes q_norm and k_norm)
Parameters: ~72 Billion
Layers: 80
Foundation: Qwen/Qwen3-32B
Donor: Qwen/Qwen2.5-72B-Instruct
Tokenizer: Qwen/Qwen3-32B Tokenizer (vocab_size: 151936)

Model Creation Process

The creation of this model was a deliberate, three-phase process designed to overcome significant architectural incompatibilities.

Phase 1: Foundation Upscaling

First, the Qwen/Qwen3-32B model (64 layers, 5120 hidden dim) was up-scaled to match the target 72B dimensions. This was done using a sophisticated self-interpolation script, where new dimensions were created by averaging different slices of the existing weights, rather than simple tiling. This produced Qwen3-32B-Upscaled, a 64-layer model with the correct 72B tensor shapes and Qwen3 architecture.

Phase 2: Donor Alignment

The Qwen/Qwen2.5-72B-Instruct model was architecturally incompatible with the Qwen3 target. To solve this, a new donor model, Qwen2.5-72B-Instruct-Aligned, was created. This process involved:

Creating an empty 80-layer model shell with the pure Qwen3 architecture.
Surgically removing all .bias tensors from the Qwen2.5 weights.
Truncating the Qwen2.5 embedding and language model head layers from a vocabulary of 152064 to match Qwen3's 151936.
Loading the modified Qwen2.5 weights into the pure Qwen3 shell, resulting in a perfectly compatible donor model.

Phase 3: Knowledge Transplant via MergeKit

With two architecturally-compatible models, the final merge was performed using MergeKit. A "Knowledge Bridge" strategy was employed to transplant a stable reasoning core from the donor while blending the rest.

The following MergeKit configuration was used:

merge_method: linear
base_model: ./Qwen3-32B-Upscaled
dtype: bfloat16

slices:
  # Slice 1: Blend the bottom 32 layers
  - merge_method: linear
    sources:
    - model: ./Qwen3-32B-Upscaled
      layer_range: [0, 32]
      parameters:
        weight: 0.5
    - model: ./Qwen2.5-72B-Instruct-Aligned
      layer_range: [0, 32]
      parameters:
        weight: 0.5

  # Slice 2: The "Knowledge Bridge" - transplant a pure block from the donor
  - merge_method: passthrough
    sources:
    - model: ./Qwen2.5-72B-Instruct-Aligned
      layer_range: [32, 48]

  # Slice 3: Blend the top layers
  - merge_method: linear
    sources:
    - model: ./Qwen3-32B-Upscaled
      layer_range: [32, 64]
      parameters:
        weight: 0.5
    - model: ./Qwen2.5-72B-Instruct-Aligned
      layer_range: [48, 80]
      parameters:
        weight: 0.5

tokenizer_source: ./Qwen3-32B-Upscaled

How to Use

This model uses the standard Qwen ChatML prompt format.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "cognitivecomputations/Qwen3-72B-Synthesis"
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain the importance of the LLaMA paper in one paragraph."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Intended Use and Limitations

This is an experimental model and should be considered a high-quality checkpoint, not a finished product.

Fine-tuning is highly recommended. While it inherits knowledge from a powerful instruction model, the merging process can create slight incoherence between layers. A round of fine-tuning on a high-quality instruction dataset is necessary to harmonize the weights and unlock its full potential.
The model may exhibit unexpected behaviors, including repetitiveness or nonsensical outputs, prior to fine-tuning.
This model has not been aligned for safety and may produce problematic, biased, or otherwise undesirable content. The user assumes all responsibility for the output generated.

Acknowledgements

This model would not have been possible without the foundational work of Alibaba Cloud on the Qwen models, and the powerful, flexible MergeKit toolkit created by Charles Goddard and Arcee.ai.

cognitivecomputations
/

Qwen3-72B-Synthesis