arxiv:2406.18530

MatchTime: Towards Automatic Soccer Game Commentary Generation

Published on Jun 26

· Submitted by

haoningwu on Jun 27

Upvote

Authors:

Jiayuan Rao ,

Haoning Wu ,

Weidi Xie

Abstract

Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as SN-Caption-test-align; Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as MatchTime; Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks.

View arXiv page View PDF Add to collection

Community

haoningwu

Paper author Paper submitter Jun 27

Project Page: https://haoningwu3639.github.io/MatchTime/
Paper: https://arxiv.org/abs/2406.18530/
Code: https://github.com/jyrao/MatchTime/

To summarize, we make the following contributions:
(i) we show the effect of misalignment in automatic commentary generation evaluation by manually correcting the alignment errors in 49 soccer matches, which can later be used as a
new benchmark for the community, termed as SN-Caption-test-align;
(ii) we further propose a multi-modal temporal video-text alignment pipeline that corrects and filters existing soccer game commentary datasets at scale, resulting in a high-quality training dataset for commentary generation, named MatchTime;
(iii) we present a soccer game commentary model named MatchVoice, establishing a new state-of-the-art performance for automatic soccer game commentary generation.

nielsr

Jul 1

Hi @haoningwu congrats on this work!

Are you interested in uploading the datasets to the hub instead of Google Drive, enabling the community to easier discover your work? See here https://huggingface.co/docs/datasets/image_dataset.

Also, are you planning to upload the model to the hub? See here for all details: https://huggingface.co/docs/hub/models-uploading. We also have Video-LLaVa available in the Transformers library: https://huggingface.co/LanguageBind/Video-LLaVA-7B-hf.