English

Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions

[πŸ“œ Paper] [πŸ–₯️ Project Page] [πŸ“– Dataset] [πŸ€— Model Weights]

image
Image Generated by DALLΒ·E

βœ… TODO List

  • Write documentation (README)
  • Release MΒ³C dataset
  • Release dialogue module weight
  • Release retrieval module weight
  • Release training code
  • Release inference code
  • Release model self-chat code
  • Launch Gradio demo for live chat

πŸ“š Citation

@article{jang2025enabling,
  title={Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions},
  author={Jang, Jihyoung and Bae, Minwook and Kim, Minji and Hakkani-Tur, Dilek and Kim, Hyounghun},
  journal={arXiv preprint arXiv:2506.00421},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jihyoung/M3C-retrieval

Base model

Qwen/Qwen2-VL-2B
Finetuned
(222)
this model

Dataset used to train jihyoung/M3C-retrieval