Papers
arxiv:2406.18530

MatchTime: Towards Automatic Soccer Game Commentary Generation

Published on Jun 26
· Submitted by haoningwu on Jun 27
Authors:
,
,

Abstract

Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as SN-Caption-test-align; Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as MatchTime; Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks.

Community

Paper author Paper submitter

Project Page: https://haoningwu3639.github.io/MatchTime/
Paper: https://arxiv.org/abs/2406.18530/
Code: https://github.com/jyrao/MatchTime/

To summarize, we make the following contributions:
(i) we show the effect of misalignment in automatic commentary generation evaluation by manually correcting the alignment errors in 49 soccer matches, which can later be used as a
new benchmark for the community, termed as SN-Caption-test-align;
(ii) we further propose a multi-modal temporal video-text alignment pipeline that corrects and filters existing soccer game commentary datasets at scale, resulting in a high-quality training dataset for commentary generation, named MatchTime;
(iii) we present a soccer game commentary model named MatchVoice, establishing a new state-of-the-art performance for automatic soccer game commentary generation.

Hi @haoningwu congrats on this work!

Are you interested in uploading the datasets to the hub instead of Google Drive, enabling the community to easier discover your work? See here https://huggingface.co/docs/datasets/image_dataset.

Also, are you planning to upload the model to the hub? See here for all details: https://huggingface.co/docs/hub/models-uploading. We also have Video-LLaVa available in the Transformers library: https://huggingface.co/LanguageBind/Video-LLaVA-7B-hf.

·
Paper author

Dataset and Models are uploaded, in future we will update more details. FYR.

dataset: https://huggingface.co/datasets/Homie0609/MatchTime
model: https://huggingface.co/Homie0609/MatchTime/tree/main

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.18530 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.18530 in a Space README.md to link it from this page.

Collections including this paper 2