faychu commited on
Commit
338435c
1 Parent(s): 846816c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - chat
7
+ - audio
8
+ ---
9
+
10
+ # Qwen2-Audio-7B
11
+
12
+ ## Introduction
13
+
14
+ Qwen2-Audio is the new series of Qwen large audio-language models. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. We introduce two distinct audio interaction modes:
15
+
16
+ * voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input;
17
+
18
+ * audio analysis: users could provide audio and text instructions for analysis during the interaction;
19
+
20
+ We release Qwen2-Audio-7B and Qwen2-Audio-7B-Instruct, which are pretrained model and chat model respectively.
21
+
22
+ For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2/), [GitHub](https://github.com/QwenLM/Qwen2-Audio), and [Report](https://www.arxiv.org/abs/2407.10759).
23
+ <br>
24
+
25
+
26
+ ## Requirements
27
+ The code of Qwen2-Audio has been in the latest Hugging face transformers and we advise you to install `transformers>=4.44.0`, or you might encounter the following error:
28
+ ```
29
+ KeyError: 'qwen2-audio'
30
+ ```
31
+
32
+ ## Quickstart
33
+
34
+ Here provides offers a code snippet illustrating the process of loading both the processor and model, alongside detailed instructions on executing the pretrained Qwen2-Audio base model for content generation.
35
+
36
+
37
+ ```python
38
+ import requests
39
+ from transformers import AutoProcessor, Qwen2AudioForConditionalGeneration
40
+ from transformers.pipelines.audio_utils import ffmpeg_read
41
+
42
+ model = Qwen2AudioForConditionalGeneration.from_pretrained("Qwen/Qwen2-Audio-7B" ,trust_remote_code=True)
43
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2-Audio-7B" ,trust_remote_code=True)
44
+
45
+ prompt = "<|audio_bos|><|AUDIO|><|audio_eos|>Generate the caption in English:"
46
+ url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/glass-breaking-151256.mp3"
47
+ audio = ffmpeg_read(requests.get(url).content, sampling_rate=processor.feature_extractor.sampling_rate)
48
+
49
+ inputs = processor(text=prompt, audios=audio, return_tensors="pt")
50
+
51
+ # Generate
52
+ generated_ids = model.generate(**inputs, max_length=256)
53
+ generated_ids = generated_ids[:, inputs.input_ids.size(1):]
54
+ response = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
55
+ # Glass is breaking.
56
+ ```
57
+
58
+ ## Citation
59
+
60
+ If you find our work helpful, feel free to give us a cite.
61
+
62
+ ```BibTeX
63
+ @article{Qwen2-Audio,
64
+ title={Qwen2-Audio Technical Report},
65
+ author={Chu, Yunfei and Xu, Jin and Yang, Qian and Wei, Haojie and Wei, Xipin and Guo, Zhifang and Leng, Yichong and Lv, Yuanjun and He, Jinzheng and Lin, Junyang and Zhou, Chang and Zhou, Jingren},
66
+ journal={arXiv preprint arXiv:2407.10759},
67
+ year={2024}
68
+ }
69
+ ```
70
+
71
+ ```BibTeX
72
+ @article{Qwen-Audio,
73
+ title={Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models},
74
+ author={Chu, Yunfei and Xu, Jin and Zhou, Xiaohuan and Yang, Qian and Zhang, Shiliang and Yan, Zhijie and Zhou, Chang and Zhou, Jingren},
75
+ journal={arXiv preprint arXiv:2311.07919},
76
+ year={2023}
77
+ }
78
+ ```