dariog commited on
Commit
5c7fa52
·
verified ·
1 Parent(s): f382e0a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -61
README.md CHANGED
@@ -2,10 +2,20 @@
2
  license: cc-by-nc-sa-4.0
3
  ---
4
 
5
- # Bony's Model Card
 
 
6
 
7
  Medium article: https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
8
 
 
 
 
 
 
 
 
 
9
  ## Model Description
10
  This XCiT (medium) model has been trained (from scratch) for prostate histopathology image analysis tasks, using images of size `224 × 224` pixels and 24 GPU H100. The XCiT architecture is a transformer model that uses cross-attention to process images, thereby improving performance compared to traditional CNN architectures. It was pre-trained on a large dataset using the DINO self-supervised training method.
11
 
@@ -74,7 +84,8 @@ The model achieved a classification accuracy of **81%** on the PANDA subset and
74
 
75
  | Model | PANDA test subset (Accuracy) ↑ | DeepGleason (MSE) ↓ | SICAPv2 (MSE) ↓ |
76
  |------------------|---------------------------------|---------------------|-----------------|
77
- | **Bony** | 81.2% | 2.934e-06 | **0.0008** |
 
78
  | **Hibou** | **83.1%** | 1.455e-06 | 0.10 |
79
  | **Histoencoder** | 81.6% | **1.003e-06** | - |
80
 
@@ -85,65 +96,10 @@ As previously mentioned, histopathology images are highly discontinuous, noisy,
85
 
86
  ### Overview of 3D Wavelet Decomposition
87
 
88
- 3D wavelet decomposition is a method well-suited for analyzing volumetric data, such as \(224 \times 224 \times 3\) images, by extracting localized information at different spatial scales.
89
-
90
- Wavelets are oscillating functions localized in time and space, used to decompose a signal \( f(x, y, z) \) into multiple scales and orientations. The 3D wavelet transform is defined as:
91
-
92
- \[
93
- W_\psi f(j, \theta, x, y, z) = f \ast \psi_{j, \theta}(x, y, z),
94
- \]
95
-
96
- where \( \psi_{j, \theta} \) is a 3D wavelet with:
97
-
98
- - \( j \): a scale defining the spatial resolution,
99
- - \( \theta \): a specific spatial orientation,
100
- - \( \ast \): the 3D convolution operator.
101
-
102
- Common 3D wavelets include Morlet and Haar wavelets, which are effective for capturing directional variations.
103
-
104
- ### 3D Scattering: Invariant Extension
105
-
106
- 3D scattering is a method related to wavelet decomposition that produces representations invariant to transformations (e.g., translation, rotation). This ensures that histopathology images are invariant in the wavelet coefficient domain, thereby enabling better generalization.
107
-
108
- #### Step 1: Wavelet Decomposition
109
- A 3D wavelet is applied to extract first-scale coefficients:
110
-
111
- \[
112
- U_1(x, y, z) = |f \ast \psi_{j_1, \theta_1}(x, y, z)|.
113
- \]
114
-
115
- #### Step 2: Higher-Level Coefficient Extraction
116
- The coefficients \( U_1 \) are further transformed to capture secondary information:
117
-
118
- \[
119
- U_2(x, y, z) = |U_1 \ast \psi_{j_2, \theta_2}(x, y, z)|.
120
- \]
121
-
122
- This process can be repeated across multiple levels \( m \), forming a hierarchical cascade.
123
-
124
- It is worth noting that these wavelet operations share similarities with CNNs, where convolution layers are applied. This highlights that wavelet decomposition is foundational to computer vision based on CNNs.
125
-
126
- #### Step 3: Invariant Aggregation
127
- At each level, a non-linear operator is applied to create invariant representations (the following is an example of such an operation):
128
-
129
- \[
130
- S_m = \int |U_m| \, dx \, dy \, dz.
131
- \]
132
-
133
- These \( S_m \) coefficients can then be used for downstream tasks.
134
-
135
- Having introduced this idea, further testing is needed.
136
-
137
- ### Testing the Idea
138
-
139
- We conducted small-scale experiments using Haar wavelets, considering a single decomposition scale and focusing on the "Approximation" of the image.
140
-
141
- Despite these limitations, training revealed some potential. We tested this idea on the PANDA subset benchmark and **Bony_wave** achieved a 83% accuracy on the test.
142
-
143
-
144
-
145
 
 
146
 
 
147
 
148
 
149
  ## Limitations and Biases
@@ -154,5 +110,3 @@ Although this model was trained for a specific prostate histopathology analysis
154
  - This model may not be used for images other than **prostate histopathology** images as it has only been trained on this kind of data.
155
  - This model shall not be used for diagnosis alone.
156
 
157
- ## Conclusion
158
- The **XCiT** model pre-trained with **DINO** shows promising results for prostate histopathology image analysis compared to other models. By using the DINO method for self-supervised learning, the model learns robust representations without explicit supervision. However, it is important to continue validating this model on diverse datasets to ensure its effectiveness in various clinical contexts.
 
2
  license: cc-by-nc-sa-4.0
3
  ---
4
 
5
+ # Bony & BonyWave Model Card
6
+
7
+ Self-Supervised Vision Transformers for Prostate Histopathology Analysis
8
 
9
  Medium article: https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
10
 
11
+ ## Model Overview
12
+
13
+ This repository hosts two variants of the XCiT-medium model trained for prostate histopathology image analysis:
14
+ * Bony: Baseline XCiT model pre-trained with DINO.
15
+ * BonyWave: Enhanced variant incorporating 3D wavelet decomposition for improved feature extraction.
16
+
17
+ Both models process 224×224 RGB tiles and were trained on 2.8M image tiles from the PANDA dataset using 24× NVIDIA H100 GPUs.
18
+
19
  ## Model Description
20
  This XCiT (medium) model has been trained (from scratch) for prostate histopathology image analysis tasks, using images of size `224 × 224` pixels and 24 GPU H100. The XCiT architecture is a transformer model that uses cross-attention to process images, thereby improving performance compared to traditional CNN architectures. It was pre-trained on a large dataset using the DINO self-supervised training method.
21
 
 
84
 
85
  | Model | PANDA test subset (Accuracy) ↑ | DeepGleason (MSE) ↓ | SICAPv2 (MSE) ↓ |
86
  |------------------|---------------------------------|---------------------|-----------------|
87
+ | **Bony** | 81.2% | 2.934e-06 | 8.0e-04 |
88
+ | **BonyWave** | 83.0% | 3.9e-04 | **7.9e-04** |
89
  | **Hibou** | **83.1%** | 1.455e-06 | 0.10 |
90
  | **Histoencoder** | 81.6% | **1.003e-06** | - |
91
 
 
96
 
97
  ### Overview of 3D Wavelet Decomposition
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
 
100
+ Wavelets are oscillating functions localized in time and space, used to decompose a signal \( f(x, y, z) \) into multiple scales and orientations. 3D wavelet decomposition is a method well-suited for analyzing volumetric data, such as \(224 \times 224 \times 3\) images, by extracting localized information at different spatial scales.
101
 
102
+ We conducted small-scale experiments using Haar wavelets, considering a single decomposition scale and focusing on the "Approximation" of the image. Despite these limitations, training revealed some potential. We tested this idea on the PANDA subset benchmark and **Bony_wave** achieved a 83% accuracy on the test. For more details see https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
103
 
104
 
105
  ## Limitations and Biases
 
110
  - This model may not be used for images other than **prostate histopathology** images as it has only been trained on this kind of data.
111
  - This model shall not be used for diagnosis alone.
112