See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Paper
• 2512.02231 • Published
• 9
None defined yet.
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
What Limits Agentic Systems Efficiency?