First Experiment (Qwen/Qwen3-Coder-30B-A3B-Instruct)
A creds.txt
file containing the base URL, API key, and model name will be available within the hour.
Update - August 2, 2025:
The first experiment has concluded, and it was an overwhelming success! I fully anticipated server crashes, freezes, and frequent downtime—especially considering I publicly shared the credentials online!
Several key lessons emerged from this experiment:
- We had more GPUs than necessary.
- The 30K context length proved too restrictive; given the number of available H100 GPUs, we should consider at least a 128K context length in future tests.
- The request rate limit of 1 request per second per IP was likely too conservative, although this measure significantly contributed to server stability. Given the intended use case—coding assistants, DSPy apps, and similar applications—this limit still helped prevent overload.
Initial statistics from the experiment are as follows:
📊 OVERALL STATISTICS
Total requests: 2,246
Unique IP addresses: 41
Time period: 2025-07-31 19:22:39+00:00 to 2025-08-01 21:21:24+00:00
Duration: 1:58:45
⚡ TRAFFIC METRICS
Peak requests per minute: 452 (at 2025-08-01 21:08)
Average response time: 1.903s (from first token to the last token)
👥 TOP USERS (by request count)
1. x.x.x.91 831 requests ( 37.0%)
2. x.x.x..1 724 requests ( 32.2%)
3. x.x.x..133 314 requests ( 14.0%)
4. x.x.x..25 122 requests ( 5.4%)
5. x.x.x..81 59 requests ( 2.6%)
6. x.x.x..41 45 requests ( 2.0%)
7. x.x.x..165 26 requests ( 1.2%)
8. x.x.x..24 16 requests ( 0.7%)
9. x.x.x..67 13 requests ( 0.6%)
10. x.x.x..244 12 requests ( 0.5%)
🎯 TOP ENDPOINTS
/v1/chat/completions 2,122 requests ( 94.5%)
/v1/models 59 requests ( 2.6%)
🤖 USER AGENTS
OpenAI/Python 1.96.1 830 requests ( 37.0%)
python-requests/2.32.3 600 requests ( 26.7%)
Ws/JS 4.83.0 335 requests ( 14.9%)
RooCode/3.25.5 148 requests ( 6.6%)
python-requests/2.25.1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support