Qwen3-Embedding-0.6B Encoder Router for WildChat Dataset V1.1

This is a trained encoder router model based on Qwen/Qwen3-Embedding-0.6B that intelligently selects between different language models. The router was trained on the WildChat_0_to_50K_Routing_Dataset_V1.1, which contains real-world conversational data from diverse user interactions from the Zubi collection.

Model Description

Base Model: Qwen/Qwen3-Embedding-0.6B
Training Dataset: hazyresearch/WildChat_0_to_50K_Routing_Dataset_V1.1
Routing Target: Llama family models only
Loss Function: Focal Loss (α=1.0, γ=2.0)

Dataset Information

The WildChat_0_to_50K_Routing_Dataset_V1.1 is a comprehensive routing dataset containing real-world conversational data. This dataset includes:

Real-world conversations: Diverse user interactions covering various topics and domains
Natural language queries: Authentic user questions and requests
Multi-model evaluations: Performance assessments across different Llama family models
Conversational context: Rich dialogue contexts that reflect actual usage patterns

This dataset enables the router to make informed decisions for conversational AI applications, helping select the most appropriate model for different types of user interactions.

Training Configuration

{
  "model_name": "Qwen/Qwen3-Embedding-0.6B",
  "max_length": 2048,
  "mlp_hidden_dims": [
    1024,
    512,
    256
  ],
  "num_epochs": 30,
  "batch_size": 64,
  "learning_rate": 3e-05,
  "warmup_ratio": 0.1,
  "weight_decay": 0.01,
  "dropout_rate": 0.1,
  "gradient_accumulation_steps": 2,
  "max_grad_norm": 1.0,
  "use_amp": true,
  "dataset_paths": [
    "hazyresearch/WildChat_0_to_50K_Routing_Dataset_V1.1"
  ],
  "max_rows": null,
  "use_pareto_optimal": false,
  "use_cheapest_best": false,
  "single_best_model": false,
  "filter_solvable": false,
  "excluded_models": [],
  "llama_family_only": true,
  "output_dir": "checkpoints/WildChat_0_to_50K_Routing_Dataset_V1.1/Qwen3-Embedding-0.6B_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL",
  "use_wandb": true,
  "early_stopping_patience": 3,
  "early_stopping_min_delta": 1e-05,
  "loss_type": "focal",
  "focal_alpha": 1.0,
  "focal_gamma": 2.0,
  "temperature": 1.0,
  "seed": 42
}

Training Command

python3 train_encoder_router.py \
    --model_name Qwen/Qwen3-Embedding-0.6B \
    --dataset_paths hazyresearch/WildChat_0_to_50K_Routing_Dataset_V1.1 \
    --output_dir checkpoints/WildChat_0_to_50K_Routing_Dataset_V1.1/Qwen3-Embedding-0.6B_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL \
    --use_wandb \
    --num_epochs 30 \
    --batch_size 64 \
    --gradient_accumulation_steps 2 \
    --llama_family_only \
    --loss_type focal \
    --focal_alpha 1.0 \
    --focal_gamma 2.0 \
    --learning_rate 3e-5 \
    --early_stopping_patience 3 \
    --early_stopping_min_delta 0.00001

Evaluation Command

python3 evaluation/eval_encoder_router.py \
    --model_path training/checkpoints/WildChat_0_to_50K_Routing_Dataset_V1.1/Qwen3-Embedding-0.6B_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL/best_model_20250626_150219.pt \
    --config_path training/checkpoints/WildChat_0_to_50K_Routing_Dataset_V1.1/Qwen3-Embedding-0.6B_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL/config.json \
    --eval_dataset_path hazyresearch/WildChat_0_to_50K_Routing_Dataset_V1.1 \
    --shuffle_eval

How It Works

This encoder router uses a transformer encoder with an MLP classification head to predict which model will perform best on a given conversational query. The training process involves:

Multi-label Classification: The model learns to predict correctness probabilities for multiple target models simultaneously
Focal Loss Training: Uses focal loss to handle class imbalance and focus on hard examples
Llama Family Focus: Specialized for routing among Llama family models
Early Stopping: Training with patience-based early stopping to prevent overfitting

Usage

To use this model with the Zubi routing system:

from routing.classes.EncoderRouter import RouterEvaluator
import torch

# Load the trained router
model_path = "path/to/best_model_20250626_150219.pt"
config_path = "path/to/config.json"

# Initialize evaluator
evaluator = RouterEvaluator(model_path, config_path)

# Route a conversational query
query = "Can you help me plan a weekend trip to Paris?"
selected_model = evaluator.predict_best_model(query)
print(f"Selected model: {selected_model}")

Repository Structure

├── best_model_20250626_150219.pt  # Trained model checkpoint
├── config.json                    # Training configuration
└── README.md                      # This file

License

This model is released under the Apache 2.0 License.

More Information

For more details about the encoder router system, training procedures, and evaluation methods, please refer to the Encoder Router README in the Zubi repository.

hazyresearch
/

Qwen3-Embedding-0.6B_for_WildChat_Dataset_V1.1