myownskyW7 commited on
Commit
0dd2645
·
verified ·
1 Parent(s): 90ae98e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -3
README.md CHANGED
@@ -1,3 +1,89 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ pipeline_tag: visual-question-answering
4
+ ---
5
+
6
+ <p align="center">
7
+ <img src="assets/logo_en.png" width="650"/>
8
+ </p>
9
+ <p align="center">
10
+ <b><font size="6">InternLM-XComposer 2.5 OmniLive</font></b>
11
+ </p>
12
+
13
+ [💻Github Repo](https://github.com/InternLM/InternLM-XComposer)
14
+
15
+ **InternLM-XComposer2.5-OL**, a specialized generalist multimodal system for streaming video and audio interactions.
16
+
17
+ <div align="center">
18
+ InternLM-XComposer2.5-OmniLive <a href="https://huggingface.co/internlm/internlm-xcomposer2d5-ol-7b">🤗</a> <a href="https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2d5-ol-7b"><img src="../assets/modelscope_logo.png" width="20px"></a> &nbsp| XComposer2.5 OmniLive Technical Report <a href="https://arxiv.org/abs/2407.03320"> 📄 </a>
19
+
20
+
21
+ ## Quickstart
22
+
23
+ We provide simple examples below to show how to use InternLM-XComposer-2.5-OL with 🤗 Transformers. For complete guide, please refer to [here](examples/README.md).
24
+
25
+
26
+ <details>
27
+ <summary>
28
+ <b>Audio Understanding</b>
29
+ </summary>
30
+
31
+ ```python
32
+ import os
33
+ os.environ['USE_HF'] = 'True'
34
+
35
+ import torch
36
+ from swift.llm import (
37
+ get_model_tokenizer, get_template, ModelType,
38
+ get_default_template_type, inference
39
+ )
40
+ from swift.utils import seed_everything
41
+
42
+ model_type = ModelType.qwen2_audio_7b_instruct
43
+ model_id_or_path = 'internlm/internlm-xcomposer2d5-ol-7b'
44
+ template_type = get_default_template_type(model_type)
45
+ print(f'template_type: {template_type}')
46
+
47
+ model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path, model_dir='audio',
48
+ model_kwargs={'device_map': 'cuda:0'})
49
+ model.generation_config.max_new_tokens = 256
50
+ template = get_template(template_type, tokenizer)
51
+ seed_everything(42)
52
+
53
+ # Chinese ASR
54
+ query = '<audio>Detect the language and recognize the speech.'
55
+ response, _ = inference(model, template, query, audios='examples/audios/chinese.mp3')
56
+ print(f'query: {query}')
57
+ print(f'response: {response}')
58
+ ```
59
+
60
+ </details>
61
+
62
+
63
+ <details>
64
+ <summary>
65
+ <b>Image Understanding</b>
66
+ </summary>
67
+
68
+ ```python
69
+ import torch
70
+ from transformers import AutoModel, AutoTokenizer
71
+
72
+ torch.set_grad_enabled(False)
73
+
74
+ # init model and tokenizer
75
+ model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval().half()
76
+ tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-ol-7b', model_dir='base', trust_remote_code=True)
77
+ model.tokenizer = tokenizer
78
+
79
+ query = 'Analyze the given image in a detail manner'
80
+ image = ['examples/images/dubai.png']
81
+ with torch.autocast(device_type='cuda', dtype=torch.float16):
82
+ response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
83
+ print(response)
84
+ ```
85
+
86
+ </details>
87
+
88
+ ### Open Source License
89
+ The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].