Video-Text-to-Text
Transformers
Safetensors
English
llava
text-generation
multimodal
Eval Results
Inference Endpoints
ZhangYuanhan commited on
Commit
cc1e179
1 Parent(s): ecc105e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -117,7 +117,7 @@ base_model:
117
  - lmms-lab/llava-onevision-qwen2-7b-si
118
  ---
119
 
120
- # LLaVA-Video-7B-Qwen2
121
 
122
  ## Table of Contents
123
 
@@ -130,7 +130,7 @@ base_model:
130
 
131
  ## Model Summary
132
 
133
- The LLaVA-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data), based on Qwen2 language model with a context window of 32K tokens.
134
 
135
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
136
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
@@ -141,7 +141,7 @@ The LLaVA-Video models are 7/72B parameter models trained on [LLaVA-Video-178K](
141
 
142
  ### Intended use
143
 
144
- The model was trained on [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and have the ability to interact with images, multi-image and videos, but specific to videos.
145
 
146
  **Feel free to share your generations in the Community tab!**
147
 
@@ -182,7 +182,7 @@ def load_video(self, video_path, max_frames_num,fps=1,force_sample=False):
182
  spare_frames = vr.get_batch(frame_idx).asnumpy()
183
  # import pdb;pdb.set_trace()
184
  return spare_frames,frame_time,video_time
185
- pretrained = "lmms-lab/LLaVA-Video-7B-Qwen2"
186
  model_name = "llava_qwen"
187
  device = "cuda"
188
  device_map = "auto"
 
117
  - lmms-lab/llava-onevision-qwen2-7b-si
118
  ---
119
 
120
+ # LLaVA-NeXT-Video-7B-Qwen2
121
 
122
  ## Table of Contents
123
 
 
130
 
131
  ## Model Summary
132
 
133
+ The LLaVA-NeXT-Video models are 7/72B parameter models trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data), based on Qwen2 language model with a context window of 32K tokens.
134
 
135
  - **Repository:** [LLaVA-VL/LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT?tab=readme-ov-file)
136
  - **Point of Contact:** [Yuanhan Zhang](https://zhangyuanhan-ai.github.io/)
 
141
 
142
  ### Intended use
143
 
144
+ The model was trained on [LLaVA-NeXT-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Video-SFT-Data) and have the ability to interact with images, multi-image and videos, but specific to videos.
145
 
146
  **Feel free to share your generations in the Community tab!**
147
 
 
182
  spare_frames = vr.get_batch(frame_idx).asnumpy()
183
  # import pdb;pdb.set_trace()
184
  return spare_frames,frame_time,video_time
185
+ pretrained = "lmms-lab/LLaVA-NeXT-Video-7B-Qwen2"
186
  model_name = "llava_qwen"
187
  device = "cuda"
188
  device_map = "auto"