zhifeixie commited on
Commit
77deed4
·
verified ·
1 Parent(s): 0debca8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -15
README.md CHANGED
@@ -4,11 +4,7 @@ license: mit
4
 
5
 
6
  # Audio-Reasoner
7
- <p align="center">
8
- <img controls src="https://github.com/xzf-thu/Audio-Reasoner/blob/main/assets/title.png" title="v" width="80%"/>
9
- </p>
10
 
11
- ## Abstract
12
  We implemented inference scaling on **Audio-Reasoner**, a large audio language model, enabling **deepthink** and **structured chain-of-thought (COT) reasoning** for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with **1.2M reasoning-rich samples** using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on **MMAU-mini(+25.42%)** and **AIR-Bench-Chat(+14.57%)** benchmarks.
13
 
14
  <p align="center">
@@ -23,11 +19,6 @@ If you like us, pls give us a star⭐ !
23
 
24
 
25
  ## Main Results
26
- <p align="center">
27
- <img src="assets\main_result.png" width="80%"/>
28
- </p>
29
-
30
-
31
 
32
 
33
  ## News and Updates
@@ -155,12 +146,6 @@ Audio - Reasoner can understand various types of audio, including sound, music,
155
  **2. Why is transformers installed after 'ms-swift' in the environment configuration?**
156
  The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance.
157
 
158
- ## More Cases
159
- <p align="center">
160
- <img src="assets\figure2-samples.png" width="90%"/>
161
- </p>
162
-
163
-
164
  ## Contact
165
 
166
  If you have any questions, please feel free to contact us via `[email protected]`.
 
4
 
5
 
6
  # Audio-Reasoner
 
 
 
7
 
 
8
  We implemented inference scaling on **Audio-Reasoner**, a large audio language model, enabling **deepthink** and **structured chain-of-thought (COT) reasoning** for multimodal understanding and reasoning. To achieve this, we constructed CoTA, a high-quality dataset with **1.2M reasoning-rich samples** using structured COT techniques. Audio-Reasoner achieves state-of-the-art results on **MMAU-mini(+25.42%)** and **AIR-Bench-Chat(+14.57%)** benchmarks.
9
 
10
  <p align="center">
 
19
 
20
 
21
  ## Main Results
 
 
 
 
 
22
 
23
 
24
  ## News and Updates
 
146
  **2. Why is transformers installed after 'ms-swift' in the environment configuration?**
147
  The version of transformers has a significant impact on the performance of the model. We have tested that version `transformers==4.49.1` is one of the suitable versions. Installing ms-swift first may ensure a more stable environment for the subsequent installation of transformers to avoid potential version conflicts that could affect the model's performance.
148
 
 
 
 
 
 
 
149
  ## Contact
150
 
151
  If you have any questions, please feel free to contact us via `[email protected]`.