BASH-Lab
/

RAVEN-AV-7B

Model card Files Files and versions

RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language

Project Page: https://bashlab.github.io/raven_project/

🛠️ Requirements and Installation

Basic Dependencies:

Python >= 3.8
Pytorch >= 2.2.0
CUDA Version >= 11.8
transformers == 4.40.0 (for reproducing paper results)
tokenizers == 0.19.1

cd RAVEN
pip install -r requirements.txt
pip install flash-attn==2.5.8 --no-build-isolation
pip install opencv-python==4.5.5.64
apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

🤖 Inference

STEP 1: Download $\texttt{siglip-so400m-patch14-384}$ from here google/siglip-so400m-patch14-384
STEP 2: Download RAVEN checkpoint

CUDA_VISIBLE_DEVICES=0 python inference.py --model-path=<MODEL PATH> --modal-type=<MODAL TYPE>

👍 Acknowledgement

The codebase of RAVEN is adapted from VideoLLaMA2. We are also grateful for their contribution.

Downloads last month: 6

Safetensors

Model size

8.6B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support