jihyoung
/

M3C-retrieval

Model card Files Files and versions Community

Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions

[📜 Paper] [🖥️ Project Page] [📖 Dataset] [🤗 Model Weights]

_{Image Generated by DALL·E}

✅ TODO List

Write documentation (README)
Release M³C dataset
Release dialogue module weight
Release retrieval module weight
Release training code
Release inference code
Release model self-chat code
Launch Gradio demo for live chat

📚 Citation

@article{jang2025enabling,
  title={Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions},
  author={Jang, Jihyoung and Bae, Minwook and Kim, Minji and Hakkani-Tur, Dilek and Kim, Hyounghun},
  journal={arXiv preprint arXiv:2506.00421},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jihyoung/M3C-retrieval

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct

Finetuned

(222)

this model

Dataset used to train jihyoung/M3C-retrieval