yuyuzuicool commited on
Commit
8bf9723
·
verified ·
1 Parent(s): b087857

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -3
README.md CHANGED
@@ -1,3 +1,135 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen3-32B
5
+ ---
6
+
7
+ <style>
8
+ .no-border-table table, .no-border-table th, .no-border-table td {
9
+ border: none !important;
10
+ }
11
+ </style>
12
+
13
+ <div class="no-border-table">
14
+
15
+ | | |
16
+ |-|-|
17
+ | [![GitHub](https://img.shields.io/badge/GitHub-Repository-black?logo=github)](https://github.com/inclusionAI/AWorld/tree/main/train) | [![arXiv](http://img.shields.io/badge/cs.AI-arXiv%3A2506.07982-B31B1B.svg?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2508.20404) |
18
+
19
+ </div>
20
+
21
+ # Qwen3-32B-AWorld
22
+
23
+ ## Model Description
24
+
25
+ **Qwen3-32B-AWorld** is a large language model fine-tuned from `Qwen3-32B`, specializing in agent capabilities and proficient tool usage. The model excels at complex agent-based tasks through precise integration with external tools, achieving a pass@1 score on the GAIA benchmark that surpasses GPT-4o and is comparable to DeepSeek-V3.
26
+
27
+ <img src="" style="width:100%;">
28
+
29
+ ## Quick Start
30
+
31
+ This guide provides instructions for quickly deploying and running inference with `Qwen3-32B-AWorld` using vLLM.
32
+
33
+ ### Deployment with vLLM
34
+
35
+ To deploy the model, use the following `vllm serve` command:
36
+
37
+ ```bash
38
+ vllm serve inclusionAI/Qwen3-32B-AWorld \
39
+ --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' \
40
+ --max-model-len 131072 \
41
+ --gpu-memory-utilization 0.85 \
42
+ --dtype bfloat16 \
43
+ --tensor-parallel-size 8 \
44
+ --enable-auto-tool-choice \
45
+ --tool-call-parser hermes
46
+ ```
47
+
48
+ **Key Configuration:**
49
+
50
+ * **Deployment Recommendation:** We recommend deploying the model on **8 GPUs** to enhance concurrency. The `tensor-parallel-size` argument should be set to the number of GPUs you are using (e.g., `8` in the command above).
51
+ * **Tool Usage Flags:** To enable the model's tool-calling capabilities, it is crucial to include the `--enable-auto-tool-choice` and `--tool-call-parser hermes` flags. These ensure that the model can correctly process tool calls and parse the results.
52
+
53
+ ### Making Inference Calls
54
+
55
+ When making an inference request, you must include the `tools` you want the model to use. The format should follow the official OpenAI API specification.
56
+
57
+ Here is a complete Python example for making an API call to the deployed model using the requests library. This example demonstrates how to query the model with a specific tool.
58
+
59
+ ```python
60
+ import requests
61
+ import json
62
+
63
+ # Define the tools available for the model to use
64
+ tools = [
65
+ {
66
+ "type": "function",
67
+ "function": {
68
+ "name": "mcp__google-search__search",
69
+ "description": "Perform a web search query",
70
+ "parameters": {
71
+ "type": "object",
72
+ "properties": {
73
+ "query": {
74
+ "description": "Search query",
75
+ "type": "string"
76
+ },
77
+ "num": {
78
+ "description": "Number of results (1-10)",
79
+ "type": "number"
80
+ }
81
+ },
82
+ "required": [
83
+ "query"
84
+ ]
85
+ }
86
+ }
87
+ }
88
+ ]
89
+
90
+ # Define the user's prompt
91
+ messages = [
92
+ {
93
+ "role": "user",
94
+ "content": "Search for hangzhou's weather today."
95
+ }
96
+ ]
97
+
98
+ # Set generation parameters
99
+ temperature = 0.6
100
+ top_p = 0.95
101
+ top_k = 20
102
+ min_p = 0
103
+
104
+ # Prepare the request payload
105
+ data = {
106
+ "messages": messages,
107
+ "tools": tools,
108
+ "temperature": temperature,
109
+ "top_p": top_p,
110
+ "top_k": top_k,
111
+ "min_p": min_p,
112
+ }
113
+
114
+ # The endpoint for the vLLM OpenAI-compatible server
115
+ # Replace {your_ip} and {your_port} with the actual IP address and port of your server.
116
+ url = "http://{your_ip}:{your_port}/v1/chat/completions"
117
+
118
+ # Send the POST request
119
+ response = requests.post(
120
+ url,
121
+ headers={"Content-Type": "application/json"},
122
+ data=json.dumps(data)
123
+ )
124
+
125
+ # Print the response from the server
126
+ print("Status Code:", response.status_code)
127
+ print("Response Body:", response.text)
128
+
129
+ ```
130
+
131
+ **Note:**
132
+
133
+ * Remember to replace `{your_ip}` and `{your_port}` in the `url` variable with the actual IP address and port where your vLLM server is running. The default port is typically `8000`.
134
+
135
+