Qwen3-Embedding-0.6B Encoder Router for WildChat Dataset V1.1
This is a trained encoder router model based on Qwen/Qwen3-Embedding-0.6B that intelligently selects between different language models. The router was trained on the WildChat_0_to_50K_Routing_Dataset_V1.1, which contains real-world conversational data from diverse user interactions from the Zubi collection.
Model Description
- Base Model: Qwen/Qwen3-Embedding-0.6B
- Training Dataset: hazyresearch/WildChat_0_to_50K_Routing_Dataset_V1.1
- Routing Target: Llama family models only
- Loss Function: Focal Loss (Ξ±=1.0, Ξ³=2.0)
Dataset Information
The WildChat_0_to_50K_Routing_Dataset_V1.1 is a comprehensive routing dataset containing real-world conversational data. This dataset includes:
- Real-world conversations: Diverse user interactions covering various topics and domains
- Natural language queries: Authentic user questions and requests
- Multi-model evaluations: Performance assessments across different Llama family models
- Conversational context: Rich dialogue contexts that reflect actual usage patterns
This dataset enables the router to make informed decisions for conversational AI applications, helping select the most appropriate model for different types of user interactions.
Training Configuration
{
"model_name": "Qwen/Qwen3-Embedding-0.6B",
"max_length": 2048,
"mlp_hidden_dims": [
1024,
512,
256
],
"num_epochs": 30,
"batch_size": 64,
"learning_rate": 3e-05,
"warmup_ratio": 0.1,
"weight_decay": 0.01,
"dropout_rate": 0.1,
"gradient_accumulation_steps": 2,
"max_grad_norm": 1.0,
"use_amp": true,
"dataset_paths": [
"hazyresearch/WildChat_0_to_50K_Routing_Dataset_V1.1"
],
"max_rows": null,
"use_pareto_optimal": false,
"use_cheapest_best": false,
"single_best_model": false,
"filter_solvable": false,
"excluded_models": [],
"llama_family_only": true,
"output_dir": "checkpoints/WildChat_0_to_50K_Routing_Dataset_V1.1/Qwen3-Embedding-0.6B_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL",
"use_wandb": true,
"early_stopping_patience": 3,
"early_stopping_min_delta": 1e-05,
"loss_type": "focal",
"focal_alpha": 1.0,
"focal_gamma": 2.0,
"temperature": 1.0,
"seed": 42
}
Training Command
python3 train_encoder_router.py \
--model_name Qwen/Qwen3-Embedding-0.6B \
--dataset_paths hazyresearch/WildChat_0_to_50K_Routing_Dataset_V1.1 \
--output_dir checkpoints/WildChat_0_to_50K_Routing_Dataset_V1.1/Qwen3-Embedding-0.6B_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL \
--use_wandb \
--num_epochs 30 \
--batch_size 64 \
--gradient_accumulation_steps 2 \
--llama_family_only \
--loss_type focal \
--focal_alpha 1.0 \
--focal_gamma 2.0 \
--learning_rate 3e-5 \
--early_stopping_patience 3 \
--early_stopping_min_delta 0.00001
Evaluation Command
python3 evaluation/eval_encoder_router.py \
--model_path training/checkpoints/WildChat_0_to_50K_Routing_Dataset_V1.1/Qwen3-Embedding-0.6B_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL/best_model_20250626_150219.pt \
--config_path training/checkpoints/WildChat_0_to_50K_Routing_Dataset_V1.1/Qwen3-Embedding-0.6B_3e5_2048_30ep_LLAMA_FAMILY_ONLY_FOCAL/config.json \
--eval_dataset_path hazyresearch/WildChat_0_to_50K_Routing_Dataset_V1.1 \
--shuffle_eval
How It Works
This encoder router uses a transformer encoder with an MLP classification head to predict which model will perform best on a given conversational query. The training process involves:
- Multi-label Classification: The model learns to predict correctness probabilities for multiple target models simultaneously
- Focal Loss Training: Uses focal loss to handle class imbalance and focus on hard examples
- Llama Family Focus: Specialized for routing among Llama family models
- Early Stopping: Training with patience-based early stopping to prevent overfitting
Usage
To use this model with the Zubi routing system:
from routing.classes.EncoderRouter import RouterEvaluator
import torch
# Load the trained router
model_path = "path/to/best_model_20250626_150219.pt"
config_path = "path/to/config.json"
# Initialize evaluator
evaluator = RouterEvaluator(model_path, config_path)
# Route a conversational query
query = "Can you help me plan a weekend trip to Paris?"
selected_model = evaluator.predict_best_model(query)
print(f"Selected model: {selected_model}")
Repository Structure
βββ best_model_20250626_150219.pt # Trained model checkpoint
βββ config.json # Training configuration
βββ README.md # This file
License
This model is released under the Apache 2.0 License.
More Information
For more details about the encoder router system, training procedures, and evaluation methods, please refer to the Encoder Router README in the Zubi repository.
- Downloads last month
- 6