ToastyPigeon/QwQ-32B-Snowdrop-v0-EmbedFix

This is trashpanda-org/QwQ-32B-Snowdrop-v0 with the embed_tokens and lm_head tensors replaced with the correctly-sized ones from Qwen/Qwen2.5-32B-Instruct.

(Why the instruct model and not QwQ? Because that's the tokenizer trashpanda was aiming for.)

At the time of posting there's an ongoing issue where the Qwen2.5 embedding tensors have dimension 152064 (matching the vocab size stated in the config), but the actual tokenizer and vocab included have fewer tokens defined (seemingly Qwen pre-initialized extra embed space for future added tokens). Some LLM software (e.g. Axolotl, Mergekit) have this trigger an automated check and, seeing that the vocab size is less than the embed size, resize the embeddings to match, which breaks compatibility in some places.

from transformers import AutoModelForCausalLM, AutoTokenizer  
import torch

# --- 1. Load Both Models ---
base_model_name = "Qwen/Qwen2.5-32B-Instruct"
finetuned_model_name = "trashpanda-org/QwQ-32B-Snowdrop-v0"  

base_model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype=torch.bfloat16)
finetuned_model = AutoModelForCausalLM.from_pretrained(finetuned_model_name, torch_dtype=torch.bfloat16)

# --- 2. Get Embedding Layers and Resize fine-tuned model's embeddings---
base_embedding_layer = base_model.get_input_embeddings()
finetuned_model.resize_token_embeddings(base_embedding_layer.weight.size(0)) # Resize so copying works
finetuned_embedding_layer = finetuned_model.get_input_embeddings()

# --- 3. Replace Embedding Layer (The Core Operation) ---
with torch.no_grad():  # Very important: No gradient tracking during this operation!
    finetuned_embedding_layer.weight.copy_(base_embedding_layer.weight)

print(finetuned_model.get_input_embeddings().weight.shape) # Verify this is the size we want it

# --- 4. Save the Modified Base Model ---
output_dir = "QwQ-32B-Snowdrop-v0-EmbedFix"
base_tokenizer = AutoTokenizer.from_pretrained(base_model_name) # Get the tokenizer, too
finetuned_model.save_pretrained(output_dir)
base_tokenizer.save_pretrained(output_dir)

# --- 5. (Optional, but Recommended) Test ---
# Load and test the modified model
modified_base_model = AutoModelForCausalLM.from_pretrained(output_dir, torch_dtype=torch.bfloat16)
modified_base_tokenizer = AutoTokenizer.from_pretrained(output_dir)

test_text = "This is a test sentence."  
inputs = modified_base_tokenizer(test_text, return_tensors="pt")
with torch.no_grad():
  outputs = modified_base_model(**inputs) # Forward pass
print(outputs) # Success, no errors running the new model