ardamamur
/

EgoExOR

English

Model card Files Files and versions Community

ardamamur commited on May 15

Commit

8ae171d

verified ·

1 Parent(s): 320f7d7

Create README.md

Browse files

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+license: apache-2.0
+datasets:
+- ardamamur/EgoExOR
+language:
+- en
+metrics:
+- f1
+base_model:
+- liuhaotian/llava-v1.5-7b
+---
+# EgoExOR Scene Graph Foundation Model
+This repository hosts the foundation model for **surgical scene graph generation** trained on the [EgoExOR](https://huggingface.co/datasets/ardamamur/EgoExOR) dataset – a multimodal, multi-perspective dataset collected in a simulated operating room (OR) environment.
+> **EgoExOR** stands for **Egocentric and Exocentric Operating Room**, integrating data from wearable AR glasses (egocentric) and static cameras (exocentric), enabling holistic modeling of complex surgical interactions.
+## 🧠 Model Overview
+The EgoExOR model is a dual-branch architecture that separately processes:
+- **Egocentric inputs**: Egocentric RGB video, hand tracking, gaze vectors, and audio
+- **Exocentric inputs**: Multiview exsocentric RGB-D video, point cloud data, and ultrasound imagery
+Each branch employs transformer-based fusion before embedding tokens are passed to a large language model (Vicuna-7B via LLaVA) to **autoregressively generate scene graph triplets**:
+**(subject, predicate, object)** – e.g., `(assistant, injecting, patient)`
+## 📊 Benchmark Results
+This model outperforms prior single-stream baselines like [ORacle](https://arxiv.org/pdf/2404.07031) and [MM2SG](https://arxiv.org/pdf/2503.02579) by effectively leveraging perspective-specific signals.
+| Model            | UI F1 | MISS F1 | Overall F1 |
+|------------------|-------|---------|------------|
+| ORacle (Baseline) | 0.72  | 0.64    | 0.67       |
+| MM2SG (Baseline)  | 0.79  | 0.66    | 0.72       |
+| **EgoExOR (Ours)**| **0.84**  | **0.69**    | **0.76**       |
+For detailed benchmark results and dataset information, see the [paper](https://arxiv.org/abs/TODO) and [GitHub repo](https://github.com/ardamamur/EgoExOR).
+## 🗃️ Dataset
+EgoExOR provides:
+- 84,553 frames (94 mins)
+- 2 surgical procedures (Ultrasound Injection & MISS)
+- 36 entities, 22 predicates
+- Over 573,000 triplets
+- Multimodal signals: RGB, depth, gaze, audio, ultrasound, point cloud, hand tracking
+You can find the dataset processing tools [GitHub repo](https://github.com/ardamamur/EgoExOR).