MatchTime: Towards Automatic Soccer Game Commentary Generation
Abstract
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as SN-Caption-test-align; Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as MatchTime; Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks.
Community
Project Page: https://haoningwu3639.github.io/MatchTime/
Paper: https://arxiv.org/abs/2406.18530/
Code: https://github.com/jyrao/MatchTime/
To summarize, we make the following contributions:
(i) we show the effect of misalignment in automatic commentary generation evaluation by manually correcting the alignment errors in 49 soccer matches, which can later be used as a
new benchmark for the community, termed as SN-Caption-test-align;
(ii) we further propose a multi-modal temporal video-text alignment pipeline that corrects and filters existing soccer game commentary datasets at scale, resulting in a high-quality training dataset for commentary generation, named MatchTime;
(iii) we present a soccer game commentary model named MatchVoice, establishing a new state-of-the-art performance for automatic soccer game commentary generation.
Hi @haoningwu congrats on this work!
Are you interested in uploading the datasets to the hub instead of Google Drive, enabling the community to easier discover your work? See here https://huggingface.co/docs/datasets/image_dataset.
Also, are you planning to upload the model to the hub? See here for all details: https://huggingface.co/docs/hub/models-uploading. We also have Video-LLaVa available in the Transformers library: https://huggingface.co/LanguageBind/Video-LLaVA-7B-hf.
Dataset and Models are uploaded, in future we will update more details. FYR.
dataset: https://huggingface.co/datasets/Homie0609/MatchTime
model: https://huggingface.co/Homie0609/MatchTime/tree/main
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper