Arctic Text2SQL: ExCoT

Snowflake’s AI research team introduces ExCoT, the first model in the Arctic Text2SQL family. ExCoT is a novel framework that combines CoT prompting with SQL execution-based DPO, using execution results — not human preferences — as the feedback signal. This enables scalable, high-quality model optimization without requiring expensive human annotations.

Based on our internal testing, ExCoT delivered state-of-the-art results on the BIRD-test benchmark, achieving best-in-class performance in the single-model, single-inference category using only public datasets (BIRD and Spider) and no additional Text2SQL data:

Llama-3.1-Arctic-ExCoT-70B improved execution accuracy on the BIRD-dev set from the base model’s 57.37% to 68.51%. Qwen-2.5-coder-Arctic-ExCoT-32B achieved similarly strong gains.
Both models significantly outperformed other well-known frontier general-purpose models, achieving over 10 points of improvement.

For more details about ExCoT and how to use it:

Evaluation results

Model
	BIRD Ex% Dev	BIRD Ex% Test
Arctic-ExCoT-70B (LLaMA 3.1 70B)	68.51	68.53
Arctic-ExCoT-32B (Qwen-2.5-Coder 32B)	68.25	68.19
XiYanSQL-QwenCoder*	67.01	69.03
OpenAI GPT-4o	54.04	–
OpenAI GPT-4	46.35	54.89
Anthropic Claude 3.5-Sonnet	50.13	–
Claude-2	42.70	49.02
OpenAI o1-mini	52.41	–
OpenAI o3-mini	53.72	–
Mistral-large-2407 (123B)	53.52	55.84
DeepSeek-V2 (236B)	56.13	56.68

Top Single-Model, Single-Inference Results on the BIRD Leaderboard (as of March 25, 2025). *XiYanSQL-QwenCoder: there are some challenges to reproduce the numbers [1][2].

Snowflake
/

Qwen-2.5-coder-Arctic-ExCoT-32B

Arctic Text2SQL: ExCoT

Evaluation results

Model tree for Snowflake/Qwen-2.5-coder-Arctic-ExCoT-32B

Collection including Snowflake/Qwen-2.5-coder-Arctic-ExCoT-32B

ExCoT Models