Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
[π Paper] [π₯οΈ Project Page] [π Dataset] [π€ Model Weights]
Image Generated by DALLΒ·E
β TODO List
- Write documentation (README)
- Release MΒ³C dataset
- Release dialogue module weight
- Release retrieval module weight
- Release training code
- Release inference code
- Release model self-chat code
- Launch Gradio demo for live chat
π Citation
@article{jang2025enabling,
title={Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions},
author={Jang, Jihyoung and Bae, Minwook and Kim, Minji and Hakkani-Tur, Dilek and Kim, Hyounghun},
journal={arXiv preprint arXiv:2506.00421},
year={2025}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support