English
ardamamur commited on
Commit
8ae171d
·
verified ·
1 Parent(s): 320f7d7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - ardamamur/EgoExOR
5
+ language:
6
+ - en
7
+ metrics:
8
+ - f1
9
+ base_model:
10
+ - liuhaotian/llava-v1.5-7b
11
+ ---
12
+ # EgoExOR Scene Graph Foundation Model
13
+
14
+
15
+ This repository hosts the foundation model for **surgical scene graph generation** trained on the [EgoExOR](https://huggingface.co/datasets/ardamamur/EgoExOR) dataset – a multimodal, multi-perspective dataset collected in a simulated operating room (OR) environment.
16
+
17
+ > **EgoExOR** stands for **Egocentric and Exocentric Operating Room**, integrating data from wearable AR glasses (egocentric) and static cameras (exocentric), enabling holistic modeling of complex surgical interactions.
18
+
19
+ ## 🧠 Model Overview
20
+
21
+ The EgoExOR model is a dual-branch architecture that separately processes:
22
+
23
+ - **Egocentric inputs**: Egocentric RGB video, hand tracking, gaze vectors, and audio
24
+ - **Exocentric inputs**: Multiview exsocentric RGB-D video, point cloud data, and ultrasound imagery
25
+
26
+ Each branch employs transformer-based fusion before embedding tokens are passed to a large language model (Vicuna-7B via LLaVA) to **autoregressively generate scene graph triplets**:
27
+ **(subject, predicate, object)** – e.g., `(assistant, injecting, patient)`
28
+
29
+ ## 📊 Benchmark Results
30
+
31
+ This model outperforms prior single-stream baselines like [ORacle](https://arxiv.org/pdf/2404.07031) and [MM2SG](https://arxiv.org/pdf/2503.02579) by effectively leveraging perspective-specific signals.
32
+
33
+ | Model | UI F1 | MISS F1 | Overall F1 |
34
+ |------------------|-------|---------|------------|
35
+ | ORacle (Baseline) | 0.72 | 0.64 | 0.67 |
36
+ | MM2SG (Baseline) | 0.79 | 0.66 | 0.72 |
37
+ | **EgoExOR (Ours)**| **0.84** | **0.69** | **0.76** |
38
+
39
+ For detailed benchmark results and dataset information, see the [paper](https://arxiv.org/abs/TODO) and [GitHub repo](https://github.com/ardamamur/EgoExOR).
40
+
41
+ ## 🗃️ Dataset
42
+
43
+ EgoExOR provides:
44
+ - 84,553 frames (94 mins)
45
+ - 2 surgical procedures (Ultrasound Injection & MISS)
46
+ - 36 entities, 22 predicates
47
+ - Over 573,000 triplets
48
+ - Multimodal signals: RGB, depth, gaze, audio, ultrasound, point cloud, hand tracking
49
+
50
+ You can find the dataset processing tools [GitHub repo](https://github.com/ardamamur/EgoExOR).
51
+