base_model: | |
- Qwen/Qwen2.5-Omni-7B | |
datasets: | |
- antgroup/HumanSense_Benchmark | |
language: | |
- en | |
license: apache-2.0 | |
metrics: | |
- accuracy | |
pipeline_tag: video-text-to-text | |
library_name: transformers | |
<div align="center" style="font-family: charter;"> | |
<p align="center"> | |
<img src="pic.png" width="400"/> | |
<p> | |
<!-- <h1></br>From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs</h1> --> | |
<div> | |
<a href="https://scholar.google.com/citations?user=sPQ |