Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
| Model Name (LiveBench),Organization,Global Average,Reasoning Average,Coding Average,Mathematics Average,Data Analysis Average,Language Average,IF Average,Model Link (LiveBench),Arena Rank (No Style Control),Arena Rank (With Style Control),Model Name (Arena),Arena Score,95% Confidence Interval,# of Votes,Model License,Model Knowledge Cutoff,Model Link (Arena) | |
| o3-2025-04-16-high,OpenAI,80.71,93.33,76.71,85.0,67.02,76.0,86.17,https://openai.com/index/introducing-o3-and-o4-mini/,3.0,1.0,o3-2025-04-16,1413.0,+8/-7,6689.0,Proprietary,Unknown,https://openai.com/index/introducing-o3-and-o4-mini/ | |
| o3-2025-04-16-medium,OpenAI,79.25,91.0,77.86,80.66,68.19,73.48,84.32,https://openai.com/index/introducing-o3-and-o4-mini/,,,,,,,,, | |
| gemini-2.5-pro-preview-05-06,Google,78.99,88.25,72.87,88.63,68.85,71.81,83.5,https://developers.googleblog.com/en/gemini-2-5-pro-io-improved-coding-performance/,1.0,1.0,Gemini-2.5-Pro-Preview-05-06,1446.0,+8/-9,4500.0,Proprietary,Unknown,http://aistudio.google.com/app/prompts/new_chat?model=gemini-2.5-pro-preview-05-06 | |
| o4-mini-2025-04-16-high,OpenAI,78.72,88.11,79.98,84.9,68.33,66.05,84.96,https://openai.com/index/introducing-o3-and-o4-mini/,11.0,6.0,o4-mini-2025-04-16,1351.0,+10/-7,5083.0,Proprietary,Unknown,https://openai.com/index/introducing-o3-and-o4-mini/ | |
| gemini-2.5-pro-preview-03-25,Google,76.69,87.53,71.08,89.16,62.47,69.31,80.59,https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/,1.0,1.0,Gemini-2.5-Pro-Exp-03-25,1437.0,+6/-5,12812.0,Proprietary,Unknown,http://aistudio.google.com/app/prompts/new_chat?model=gemini-2.5-pro-exp-03-25 | |
| claude-3-7-sonnet-20250219-thinking-64k,Anthropic,74.5,76.17,73.19,79.0,69.11,68.27,81.25,https://www.anthropic.com/news/claude-3-7-sonnet,27.0,12.0,Claude 3.7 Sonnet (thinking-32k),1300.0,+5/-5,12038.0,Proprietary,Unknown,https://www.anthropic.com/news/claude-3-7-sonnet | |
| o4-mini-2025-04-16-medium,OpenAI,74.4,78.47,74.22,81.02,68.47,62.41,81.83,https://openai.com/index/introducing-o3-and-o4-mini/,,,,,,,,, | |
| qwen3-235b-a22b-thinking,Alibaba,73.23,78.61,65.32,78.78,68.31,60.61,87.73,https://qwenlm.github.io/blog/qwen3/,13.0,17.0,Qwen3-235B-A22B,1343.0,+11/-9,3611.0,Apache 2.0,Unknown,https://qwenlm.github.io/blog/qwen3/ | |
| deepseek-r1,DeepSeek,72.49,77.17,74.98,77.91,69.63,54.77,80.51,https://huggingface.co/deepseek-ai/DeepSeek-R1,11.0,9.0,DeepSeek-R1,1358.0,+4/-4,18493.0,MIT,Unknown,https://api-docs.deepseek.com/news/news250120 | |
| qwen3-32b-thinking,Alibaba,71.03,77.75,64.24,75.58,68.29,55.15,85.17,https://qwenlm.github.io/blog/qwen3/,,,,,,,,, | |
| grok-3-mini-beta-high,xAI,70.25,87.61,54.52,77.0,64.58,59.09,78.7,https://x.ai/blog/grok-3,4.0,6.0,Grok-3-Preview-02-24,1403.0,+4/-4,14843.0,Proprietary,Unknown,https://x.ai/blog/grok-3 | |
| gemini-2.5-flash-preview-04-17,Google,69.93,73.47,60.33,81.8,65.53,59.43,79.02,https://blog.google/products/gemini/gemini-2-5-flash-preview/,5.0,6.0,Gemini-2.5-Flash-Preview-04-17,1394.0,+7/-7,5959.0,Proprietary,Unknown,http://aistudio.google.com/app/prompts/new_chat?model=gemini-2.5-flash-preview-04-17 | |
| qwq-32b,Alibaba,69.5,76.72,61.36,76.08,69.53,51.48,81.83,https://qwenlm.github.io/blog/qwq-32b/,22.0,30.0,QwQ-32B,1313.0,+6/-5,9946.0,Apache 2.0,Unknown,https://huggingface.co/Qwen/QwQ-32B | |
| gpt-4.5-preview-2025-02-27,OpenAI,65.93,54.42,76.07,67.94,60.07,64.76,72.33,https://openai.com/index/introducing-gpt-4-5/,5.0,4.0,GPT-4.5-Preview,1398.0,+4/-5,15275.0,Proprietary,Unknown,https://openai.com/index/introducing-gpt-4-5/ | |
| qwen3-30b-a3b-thinking,Alibaba,65.32,66.83,47.47,72.2,66.6,55.58,83.23,https://qwenlm.github.io/blog/qwen3/,,,,,,,,, | |
| claude-3-7-sonnet-20250219-base,Anthropic,64.62,49.11,74.28,64.65,59.96,63.19,76.49,https://www.anthropic.com/news/claude-3-7-sonnet,34.0,16.0,Claude 3.7 Sonnet,1290.0,+6/-5,17387.0,Proprietary,Unknown,https://www.anthropic.com/news/claude-3-7-sonnet | |
| grok-3-beta,xAI,63.17,48.53,73.58,62.75,55.63,53.8,84.74,https://x.ai/blog/grok-3,4.0,6.0,Grok-3-Preview-02-24,1403.0,+4/-4,14843.0,Proprietary,Unknown,https://x.ai/blog/grok-3 | |
| gpt-4.1-2025-04-14,OpenAI,62.99,44.39,73.19,62.39,66.4,54.55,77.05,https://openai.com/index/gpt-4-1/,10.0,6.0,GPT-4.1-2025-04-14,1366.0,+7/-8,5102.0,Proprietary,Unknown,https://openai.com/index/gpt-4-1/ | |
| deepseek-v3-0324,DeepSeek,62.82,44.28,68.91,71.44,64.02,46.82,81.47,https://huggingface.co/deepseek-ai/DeepSeek-V3-0324,8.0,6.0,DeepSeek-V3-0324,1373.0,+7/-5,8753.0,MIT,Unknown,https://api-docs.deepseek.com/news/news250325 | |
| chatgpt-4o-latest-2025-03-27,OpenAI,61.65,48.81,77.48,55.72,66.52,49.43,71.92,https://x.com/OpenAIDevs/status/1905335104211185999?t=pmYR2_xGFyWs1xOGuNxRsw&s=19,3.0,4.0,ChatGPT-4o-latest (2025-03-26),1408.0,+6/-6,10290.0,Proprietary,Unknown,https://x.com/OpenAI/status/1905331956856050135 | |
| gemini-2.0-flash-001,Google,60.05,44.25,64.74,63.19,59.92,42.39,85.79,https://blog.google/technology/google-deepmind/gemini-model-updates-february-2025/,11.0,18.0,Gemini-2.0-Flash-001,1355.0,+4/-3,24913.0,Proprietary,Unknown,https://aistudio.google.com/app/prompts/new_chat?instructions=lmsys-1121&model=gemini-2.0-flash-001 | |
| qwen2.5-max,Alibaba,60.03,38.53,66.79,56.87,64.27,58.37,75.35,https://qwenlm.github.io/blog/qwen2.5-max/,15.0,18.0,Qwen2.5-Max,1341.0,+4/-3,23180.0,Proprietary,Unknown,https://qwenlm.github.io/blog/qwen2.5-max/ | |
| gpt-4.1-mini-2025-04-14,OpenAI,59.05,53.78,72.11,58.78,61.34,38.0,70.31,https://openai.com/index/gpt-4-1/,21.0,17.0,GPT-4.1-mini-2025-04-14,1322.0,+6/-7,4950.0,Proprietary,Unknown,https://openai.com/index/gpt-4-1/ | |
| claude-3-5-sonnet-20241022,Anthropic,57.94,43.22,73.9,50.54,56.19,54.48,69.3,https://www.anthropic.com/news/3-5-models-and-computer-use,40.0,18.0,Claude 3.5 Sonnet (20241022),1283.0,+2/-3,65435.0,Proprietary,2024/4,https://www.anthropic.com/news/3-5-models-and-computer-use | |
| learnlm-2.0-flash-experimental,Google,57.27,39.72,64.3,61.1,51.42,43.34,83.76,https://ai.google.dev/gemini-api/docs/learnlm,,,,,,,,, | |
| phi-4-reasoning-plus,Microsoft,56.64,57.83,60.59,62.83,54.74,30.69,73.17,https://huggingface.co/microsoft/Phi-4-reasoning-plus,,,,,,,,, | |
| mistral-medium-2505,Mistral AI,56.59,41.97,61.48,59.74,60.2,44.74,71.4,https://mistral.ai/news/mistral-medium-3,,,,,,,,, | |
| deepseek-r1-distill-llama-70b,DeepSeek,55.51,59.81,46.65,58.8,60.81,37.05,69.94,https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B,,,,,,,,, | |
| llama4-maverick-instruct-basic,Meta,55.19,43.83,54.19,60.58,47.11,49.65,75.75,https://ai.meta.com/blog/llama-4-multimodal-intelligence/,44.0,34.0,Llama-4-Maverick-17B-128E-Instruct,1270.0,+6/-6,7798.0,Llama 4,Unknown,https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct | |
| step-2-16k-202411,StepFun,54.05,42.39,57.58,43.68,62.35,38.41,79.88,https://www.stepfun.com/#step2,24.0,32.0,Step-2-16K-Exp,1305.0,+8/-7,5125.0,Proprietary,Unknown,https://platform.stepfun.com/docs/llm/text | |
| gpt-4o-2024-11-20,OpenAI,53.95,39.75,69.29,41.48,63.53,44.68,64.94,https://openai.com/index/hello-gpt-4o/,49.0,37.0,GPT-4o-2024-08-06,1265.0,+2/-3,47970.0,Proprietary,2023/10,https://platform.openai.com/docs/models/gpt-4o | |
| gemini-2.0-flash-lite-001,Google,53.75,32.25,59.31,54.97,65.39,33.94,76.63,https://developers.googleblog.com/en/start-building-with-the-gemini-2-0-flash-family/,22.0,26.0,Gemini-2.0-Flash-Lite,1312.0,+3/-3,25020.0,Proprietary,Unknown,https://aistudio.google.com/prompts/new_chat?model=gemini-2.0-flash-lite | |
| hunyuan-turbos-20250313,Tencent,50.77,38.22,50.35,57.47,47.99,34.46,76.13,https://cloud.tencent.com/document/product/1729/104753,24.0,26.0,Hunyuan-TurboS-20250226,1303.0,+11/-9,2449.0,Proprietary,Unknown,https://cloud.tencent.com/document/product/1729/104753 | |
| mistral-large-2411,Mistral AI,50.25,33.83,62.89,42.2,54.2,40.45,67.93,https://huggingface.co/mistralai/Mistral-Large-Instruct-2411,67.0,66.0,Mistral-Large-2411,1249.0,+4/-3,29643.0,MRL,Unknown,https://huggingface.co/mistralai/Mistral-Large-Instruct-2411 | |
| learnlm-1.5-pro-experimental,Google,49.3,34.86,58.93,56.71,39.3,37.86,68.16,https://ai.google.dev/gemini-api/docs/learnlm,,,,,,,,, | |
| dracarys2-72b-instruct,AbacusAI,49.2,37.49,58.73,52.25,48.48,33.06,65.22,https://huggingface.co/abacusai/Dracarys2-72B-Instruct,,,,,,,,, | |
| qwen2.5-72b-instruct-turbo,Alibaba,49.04,34.08,57.26,51.88,50.16,36.47,64.39,https://huggingface.co/Qwen/Qwen2.5-72B-Instruct,60.0,66.0,Qwen2.5-72B-Instruct,1257.0,+3/-2,41517.0,Qwen,2024/9,https://qwenlm.github.io/blog/qwen2.5/ | |
| llama-3.3-70b-instruct-turbo,Meta,48.86,32.53,51.82,41.4,40.79,43.97,82.67,https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct,60.0,50.0,Llama-3.3-70B-Instruct,1257.0,+3/-3,38088.0,Llama-3.3,Unknown,https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct | |
| gemma-3-27b-it,Google,48.44,34.42,48.94,52.27,38.8,41.31,74.9,https://blog.google/technology/developers/gemma-3/,15.0,18.0,Gemma-3-27B-it,1341.0,+5/-4,12343.0,Gemma,Unknown,http://aistudio.google.com/app/prompts/new_chat?model=gemma-3-27b-it | |
| deepseek-r1-distill-qwen-32b,DeepSeek,47.4,44.36,46.33,60.13,46.94,30.92,55.71,https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B,,,,,,,,, | |
| gpt-4.1-nano-2025-04-14,OpenAI,46.58,35.58,63.21,42.39,49.82,30.96,57.54,https://openai.com/index/gpt-4-1/,43.0,43.0,GPT-4.1-nano-2025-04-14,1271.0,+7/-6,5121.0,Proprietary,Unknown,https://openai.com/index/gpt-4-1/ | |
| dracarys2-llama-3.1-70b-instruct,AbacusAI,46.47,36.67,41.14,40.3,55.13,42.37,63.24,https://huggingface.co/abacusai/Dracarys2-Llama-3.1-70B-Instruct,,,,,,,,, | |
| mistral-small-2503,Mistral AI,45.92,37.08,49.65,38.39,52.14,34.59,63.66,https://mistral.ai/news/mistral-small-3-1,79.0,82.0,Mistral-Small-24B-Instruct-2501,1218.0,+4/-4,15323.0,Apache 2.0,Unknown,https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501 | |
| claude-3-5-haiku-20241022,Anthropic,44.98,26.19,53.17,34.84,54.12,39.71,61.88,https://www.anthropic.com/claude/haiku,75.0,52.0,Claude 3.5 Haiku (20241022),1237.0,+3/-3,36354.0,Propretary,Unknown,https://www.anthropic.com/news/3-5-models-and-computer-use | |
| amazon.nova-pro-v1:0,Amazon,44.33,28.25,49.65,37.7,44.34,38.94,67.13,https://aws.amazon.com/ai/generative-ai/nova/,68.0,72.0,Amazon Nova Pro 1.0,1245.0,+3/-3,25794.0,Proprietary,Unknown,https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html | |
| gpt-4o-mini-2024-07-18,OpenAI,43.41,25.64,55.02,38.05,55.1,29.88,56.8,https://openai.com/index/hello-gpt-4o/,44.0,52.0,GPT-4o-mini-2024-07-18,1272.0,+2/-2,71363.0,Proprietary,2023/10,https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ | |
| amazon.nova-lite-v1:0,Amazon,39.11,32.0,45.04,34.62,41.24,27.62,54.13,https://aws.amazon.com/ai/generative-ai/nova/,79.0,90.0,Amazon Nova Lite 1.0,1217.0,+3/-3,20652.0,Proprietary,Unknown,https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html | |
| command-r-plus-08-2024,Cohere,34.88,21.64,27.13,22.82,49.23,30.86,57.61,https://docs.cohere.com/docs/models,79.0,77.0,Command R+ (08-2024),1215.0,+5/-5,10537.0,CC-BY-NC-4.0,2024/8,https://docs.cohere.com/docs/command-r-plus#model-details | |
| qwen2.5-7b-instruct-turbo,Alibaba,34.37,22.31,34.29,36.81,42.33,18.38,52.11,https://huggingface.co/Qwen/Qwen2.5-7B-Instruct,,,,,,,,, | |
| amazon.nova-micro-v1:0,Amazon,33.67,25.42,28.92,34.15,41.29,24.19,48.04,https://aws.amazon.com/ai/generative-ai/nova/,90.0,103.0,Amazon Nova Micro 1.0,1198.0,+5/-3,20660.0,Proprietary,Unknown,https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html | |
| command-r-08-2024,Cohere,31.39,20.58,26.1,18.35,39.77,27.93,55.62,https://docs.cohere.com/docs/models,99.0,95.0,Command R (08-2024),1180.0,+6/-5,10849.0,CC-BY-NC-4.0,2024/8,https://docs.cohere.com/docs/command-r-plus#model-details | |