Update README.md
Browse files
README.md
CHANGED
@@ -4,11 +4,7 @@ license: mit
|
|
4 |
|
5 |
|
6 |
# Audio-Reasoner
|
7 |
-
<p align="center">
|
8 |
-
<img controls src="https://github.com/xzf-thu/Audio-Reasoner/blob/main/assets/title.png" title="v" width="80%"/>
|
9 |
-
</p>
|
10 |
|
11 |
-
## Abstract
|
12 |
We implemented inference scaling on **Audio-Reasoner**, a large audio language model, enabling **deepthink** and **structured chain-of-thought (COT) reasoning** for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with **1.2M reasoning-rich samples** using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on **MMAU-mini(+25.42%)** and **AIR-Bench-Chat(+14.57%)** benchmarks.
|
13 |
|
14 |
<p align="center">
|
@@ -23,11 +19,6 @@ If you like us, pls give us a star⭐ !
|
|
23 |
|
24 |
|
25 |
## Main Results
|
26 |
-
<p align="center">
|
27 |
-
<img src="assets\main_result.png" width="80%"/>
|
28 |
-
</p>
|
29 |
-
|
30 |
-
|
31 |
|
32 |
|
33 |
## News and Updates
|
@@ -155,12 +146,6 @@ Audio - Reasoner can understand various types of audio, including sound, music,
|
|
155 |
**2. Why is transformers installed after 'ms-swift' in the environment configuration?**
|
156 |
The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance.
|
157 |
|
158 |
-
## More Cases
|
159 |
-
<p align="center">
|
160 |
-
<img src="assets\figure2-samples.png" width="90%"/>
|
161 |
-
</p>
|
162 |
-
|
163 |
-
|
164 |
## Contact
|
165 |
|
166 |
If you have any questions, please feel free to contact us via `[email protected]`.
|
|
|
4 |
|
5 |
|
6 |
# Audio-Reasoner
|
|
|
|
|
|
|
7 |
|
|
|
8 |
We implemented inference scaling on **Audio-Reasoner**, a large audio language model, enabling **deepthink** and **structured chain-of-thought (COT) reasoning** for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with **1.2M reasoning-rich samples** using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on **MMAU-mini(+25.42%)** and **AIR-Bench-Chat(+14.57%)** benchmarks.
|
9 |
|
10 |
<p align="center">
|
|
|
19 |
|
20 |
|
21 |
## Main Results
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
|
24 |
## News and Updates
|
|
|
146 |
**2. Why is transformers installed after 'ms-swift' in the environment configuration?**
|
147 |
The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance.
|
148 |
|
|
|
|
|
|
|
|
|
|
|
|
|
149 |
## Contact
|
150 |
|
151 |
If you have any questions, please feel free to contact us via `[email protected]`.
|