Papers
arxiv:2505.04642

Rethinking Multimodal Sentiment Analysis: A High-Accuracy, Simplified Fusion Architecture

Published on May 5
Authors:
,

Abstract

A lightweight deep learning model using modality-specific encoders and simple concatenation achieves high emotion classification accuracy on the IEMOCAP dataset.

AI-generated summary

Multimodal sentiment analysis, a pivotal task in affective computing, seeks to understand human emotions by integrating cues from language, audio, and visual signals. While many recent approaches leverage complex attention mechanisms and hierarchical architectures, we propose a lightweight, yet effective fusion-based deep learning model tailored for utterance-level emotion classification. Using the benchmark IEMOCAP dataset, which includes aligned text, audio-derived numeric features, and visual descriptors, we design a modality-specific encoder using fully connected layers followed by dropout regularization. The modality-specific representations are then fused using simple concatenation and passed through a dense fusion layer to capture cross-modal interactions. This streamlined architecture avoids computational overhead while preserving performance, achieving a classification accuracy of 92% across six emotion categories. Our approach demonstrates that with careful feature engineering and modular design, simpler fusion strategies can outperform or match more complex models, particularly in resource-constrained environments.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2505.04642 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2505.04642 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2505.04642 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.