Arctic Text2SQL: ExCoT

Snowflake’s AI research team introduces ExCoT, the first model in the Arctic Text2SQL family. ExCoT is a novel framework that combines CoT prompting with SQL execution-based DPO, using execution results — not human preferences — as the feedback signal. This enables scalable, high-quality model optimization without requiring expensive human annotations.

Based on our internal testing, ExCoT delivered state-of-the-art results on the BIRD-test benchmark, achieving best-in-class performance in the single-model, single-inference category using only public datasets (BIRD and Spider) and no additional Text2SQL data:

  • Llama-3.1-Arctic-ExCoT-70B improved execution accuracy on the BIRD-dev set from the base model’s 57.37% to 68.51%. Qwen-2.5-coder-Arctic-ExCoT-32B achieved similarly strong gains.

  • Both models significantly outperformed other well-known frontier general-purpose models, achieving over 10 points of improvement.

For more details about ExCoT and how to use it:

Evaluation results

Model
BIRD Ex% Dev BIRD Ex% Test
Arctic-ExCoT-70B (LLaMA 3.1 70B) 68.51 68.53
Arctic-ExCoT-32B (Qwen-2.5-Coder 32B) 68.25 68.19
XiYanSQL-QwenCoder* 67.01 69.03
OpenAI GPT-4o 54.04
OpenAI GPT-4 46.35 54.89
Anthropic Claude 3.5-Sonnet 50.13
Claude-2 42.70 49.02
OpenAI o1-mini 52.41
OpenAI o3-mini 53.72
Mistral-large-2407 (123B) 53.52 55.84
DeepSeek-V2 (236B) 56.13 56.68

Top Single-Model, Single-Inference Results on the BIRD Leaderboard (as of March 25, 2025). *XiYanSQL-QwenCoder: there are some challenges to reproduce the numbers [1][2].

Downloads last month
43
Safetensors
Model size
32.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Snowflake/Qwen-2.5-coder-Arctic-ExCoT-32B

Base model

Qwen/Qwen2.5-32B
Finetuned
(14)
this model
Quantizations
2 models