Inference Provider

VERIFIED
2,798,819 monthly requests

AI & ML interests

AI-centric cloud platform ready for intensive workloads Training-ready platform with NVIDIA® H100 Tensor Core GPUs. Competitive pricing. Dedicated support.

Recent Activity

Articles

GLM 4.5

2
#6 opened 11 days ago by
t1u1
ibragim-bad 
posted an update 14 days ago
view post
Post
234
We tested Qwen3-Coder, GPT-5 and other 30+ models on new SWE-Bench like tasks from July 2025!

Hi all, I’m Ibragim from Nebius.

We ran a benchmark on 34 fresh GitHub PR tasks from July 2025 using the SWE-rebench leaderboard https://swe-rebench.com/leaderboard . These are real, recent problems — no training-set contamination — and include both proprietary and open-source models.

Quick takeaways:

> GPT-5-Medium leads overall (29.4% resolved rate, 38.2% pass@5).
> Qwen3-Coder is the best open-source performer, matching GPT-5-High in pass@5 (32.4%) despite a lower resolved rate.
> Claude Sonnet 4.0 lags behind in pass@5 at 23.5%.

All tasks come from the continuously updated, decontaminated nebius/SWE-rebench-leaderboard for real-world SWE tasks.

  • 1 reply
·