Model Card for Model ID

This model card describes TEMPURA, a vision-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of untrimmed videos.

Model Details

Model Description

TEMPURA enhances video temporal understanding by integrating causal reasoning with fine-grained temporal segmentation. More details can be found on the project page.

Developed by: Jen-Hao Cheng, Vivian Wang, Huayu Wang, Huapeng Zhou, Yi-Hao Peng, Hou-I Liu, Hsiang-Wei Huang, Kuang-Ming Chen, Cheng-Yen Yang, Wenhao Chai, Yi-Ling Chen, Vibhav Vineet, Qin Cai, Jenq-Neng Hwang
Model type: Video-Language Model
Language(s) (NLP): English
License: cc-by-4.0
Finetuned from model: Qwen/Qwen2.5-VL-3B-Instruct

Model Sources

Repository: https://github.com/andy-cheng/TEMPURA
Paper: TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Project Page: https://andy-cheng.github.io/TEMPURA/

Uses

Direct Use

The model can be used directly for temporal grounding and highlight detection in videos.

Downstream Use [optional]

The model can be fine-tuned for various applications requiring temporal video understanding, such as video summarization, event extraction, and question answering.

Out-of-Scope Use

The model may not perform well on videos with significantly different visual styles or languages compared to the training data.

Bias, Risks, and Limitations

The model's performance is influenced by biases present in the VER dataset. Further analysis is needed to fully characterize these biases.

Recommendations

Users should be aware of potential biases in the model's outputs.

How to Get Started with the Model

Inference: Please check the inference example.

Training: Please check the model training script.

Training Details

Training Data

The model was trained on the VER dataset (https://huggingface.co/datasets/andaba/TEMPURA-VER).

Training Procedure

The training procedure involves masked event prediction and video event segmentation with temporal dense captioning. See the training scripts in the repository for details.

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation

BibTeX:

@article{tempura,
       title={TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action}, 
       author={Jen-Hao Cheng and Vivian Wang and Huayu Wang and Huapeng Zhou and Yi-Hao Peng and Hou-I Liu
              and Hsiang-Wei Huang and Kuang-Ming Chen and Cheng-Yen Yang
              and Wenhao Chai and Yi-Ling Chen and Vibhav Vineet and Qin Cai and Jenq-Neng Hwang},
       journal={arXiv preprint arXiv:2505.01583},
       year={2025}
}

APA:

Cheng, J.-H., Wang, V., Wang, H., Zhou, H., Peng, Y.-H., Liu, H.-I., Huang, H.-W., Chen, K.-M., Yang, C.-Y., Chai, W., Chen, Y.-L., Vineet, V., Cai, Q., & Hwang, J.-N. (2025). TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action. arXiv preprint arXiv:2505.01583.

Model Card Contact

Jen-Hao Cheng, [email protected]

andaba
/

TEMPURA-Qwen2.5-VL-3B-s2

Model Card for Model ID

Model Details

Model Description

Model Sources

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Training Hyperparameters

Speeds, Sizes, Times

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Environmental Impact

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation

Model Card Contact

Model tree for andaba/TEMPURA-Qwen2.5-VL-3B-s2

Dataset used to train andaba/TEMPURA-Qwen2.5-VL-3B-s2

Collection including andaba/TEMPURA-Qwen2.5-VL-3B-s2

TEMPURA