Update README.md
Browse files
README.md
CHANGED
@@ -2,10 +2,20 @@
|
|
2 |
license: cc-by-nc-sa-4.0
|
3 |
---
|
4 |
|
5 |
-
# Bony
|
|
|
|
|
6 |
|
7 |
Medium article: https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
## Model Description
|
10 |
This XCiT (medium) model has been trained (from scratch) for prostate histopathology image analysis tasks, using images of size `224 × 224` pixels and 24 GPU H100. The XCiT architecture is a transformer model that uses cross-attention to process images, thereby improving performance compared to traditional CNN architectures. It was pre-trained on a large dataset using the DINO self-supervised training method.
|
11 |
|
@@ -74,7 +84,8 @@ The model achieved a classification accuracy of **81%** on the PANDA subset and
|
|
74 |
|
75 |
| Model | PANDA test subset (Accuracy) ↑ | DeepGleason (MSE) ↓ | SICAPv2 (MSE) ↓ |
|
76 |
|------------------|---------------------------------|---------------------|-----------------|
|
77 |
-
| **Bony** | 81.2% | 2.934e-06 |
|
|
|
78 |
| **Hibou** | **83.1%** | 1.455e-06 | 0.10 |
|
79 |
| **Histoencoder** | 81.6% | **1.003e-06** | - |
|
80 |
|
@@ -85,65 +96,10 @@ As previously mentioned, histopathology images are highly discontinuous, noisy,
|
|
85 |
|
86 |
### Overview of 3D Wavelet Decomposition
|
87 |
|
88 |
-
3D wavelet decomposition is a method well-suited for analyzing volumetric data, such as \(224 \times 224 \times 3\) images, by extracting localized information at different spatial scales.
|
89 |
-
|
90 |
-
Wavelets are oscillating functions localized in time and space, used to decompose a signal \( f(x, y, z) \) into multiple scales and orientations. The 3D wavelet transform is defined as:
|
91 |
-
|
92 |
-
\[
|
93 |
-
W_\psi f(j, \theta, x, y, z) = f \ast \psi_{j, \theta}(x, y, z),
|
94 |
-
\]
|
95 |
-
|
96 |
-
where \( \psi_{j, \theta} \) is a 3D wavelet with:
|
97 |
-
|
98 |
-
- \( j \): a scale defining the spatial resolution,
|
99 |
-
- \( \theta \): a specific spatial orientation,
|
100 |
-
- \( \ast \): the 3D convolution operator.
|
101 |
-
|
102 |
-
Common 3D wavelets include Morlet and Haar wavelets, which are effective for capturing directional variations.
|
103 |
-
|
104 |
-
### 3D Scattering: Invariant Extension
|
105 |
-
|
106 |
-
3D scattering is a method related to wavelet decomposition that produces representations invariant to transformations (e.g., translation, rotation). This ensures that histopathology images are invariant in the wavelet coefficient domain, thereby enabling better generalization.
|
107 |
-
|
108 |
-
#### Step 1: Wavelet Decomposition
|
109 |
-
A 3D wavelet is applied to extract first-scale coefficients:
|
110 |
-
|
111 |
-
\[
|
112 |
-
U_1(x, y, z) = |f \ast \psi_{j_1, \theta_1}(x, y, z)|.
|
113 |
-
\]
|
114 |
-
|
115 |
-
#### Step 2: Higher-Level Coefficient Extraction
|
116 |
-
The coefficients \( U_1 \) are further transformed to capture secondary information:
|
117 |
-
|
118 |
-
\[
|
119 |
-
U_2(x, y, z) = |U_1 \ast \psi_{j_2, \theta_2}(x, y, z)|.
|
120 |
-
\]
|
121 |
-
|
122 |
-
This process can be repeated across multiple levels \( m \), forming a hierarchical cascade.
|
123 |
-
|
124 |
-
It is worth noting that these wavelet operations share similarities with CNNs, where convolution layers are applied. This highlights that wavelet decomposition is foundational to computer vision based on CNNs.
|
125 |
-
|
126 |
-
#### Step 3: Invariant Aggregation
|
127 |
-
At each level, a non-linear operator is applied to create invariant representations (the following is an example of such an operation):
|
128 |
-
|
129 |
-
\[
|
130 |
-
S_m = \int |U_m| \, dx \, dy \, dz.
|
131 |
-
\]
|
132 |
-
|
133 |
-
These \( S_m \) coefficients can then be used for downstream tasks.
|
134 |
-
|
135 |
-
Having introduced this idea, further testing is needed.
|
136 |
-
|
137 |
-
### Testing the Idea
|
138 |
-
|
139 |
-
We conducted small-scale experiments using Haar wavelets, considering a single decomposition scale and focusing on the "Approximation" of the image.
|
140 |
-
|
141 |
-
Despite these limitations, training revealed some potential. We tested this idea on the PANDA subset benchmark and **Bony_wave** achieved a 83% accuracy on the test.
|
142 |
-
|
143 |
-
|
144 |
-
|
145 |
|
|
|
146 |
|
|
|
147 |
|
148 |
|
149 |
## Limitations and Biases
|
@@ -154,5 +110,3 @@ Although this model was trained for a specific prostate histopathology analysis
|
|
154 |
- This model may not be used for images other than **prostate histopathology** images as it has only been trained on this kind of data.
|
155 |
- This model shall not be used for diagnosis alone.
|
156 |
|
157 |
-
## Conclusion
|
158 |
-
The **XCiT** model pre-trained with **DINO** shows promising results for prostate histopathology image analysis compared to other models. By using the DINO method for self-supervised learning, the model learns robust representations without explicit supervision. However, it is important to continue validating this model on diverse datasets to ensure its effectiveness in various clinical contexts.
|
|
|
2 |
license: cc-by-nc-sa-4.0
|
3 |
---
|
4 |
|
5 |
+
# Bony & BonyWave Model Card
|
6 |
+
|
7 |
+
Self-Supervised Vision Transformers for Prostate Histopathology Analysis
|
8 |
|
9 |
Medium article: https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
|
10 |
|
11 |
+
## Model Overview
|
12 |
+
|
13 |
+
This repository hosts two variants of the XCiT-medium model trained for prostate histopathology image analysis:
|
14 |
+
* Bony: Baseline XCiT model pre-trained with DINO.
|
15 |
+
* BonyWave: Enhanced variant incorporating 3D wavelet decomposition for improved feature extraction.
|
16 |
+
|
17 |
+
Both models process 224×224 RGB tiles and were trained on 2.8M image tiles from the PANDA dataset using 24× NVIDIA H100 GPUs.
|
18 |
+
|
19 |
## Model Description
|
20 |
This XCiT (medium) model has been trained (from scratch) for prostate histopathology image analysis tasks, using images of size `224 × 224` pixels and 24 GPU H100. The XCiT architecture is a transformer model that uses cross-attention to process images, thereby improving performance compared to traditional CNN architectures. It was pre-trained on a large dataset using the DINO self-supervised training method.
|
21 |
|
|
|
84 |
|
85 |
| Model | PANDA test subset (Accuracy) ↑ | DeepGleason (MSE) ↓ | SICAPv2 (MSE) ↓ |
|
86 |
|------------------|---------------------------------|---------------------|-----------------|
|
87 |
+
| **Bony** | 81.2% | 2.934e-06 | 8.0e-04 |
|
88 |
+
| **BonyWave** | 83.0% | 3.9e-04 | **7.9e-04** |
|
89 |
| **Hibou** | **83.1%** | 1.455e-06 | 0.10 |
|
90 |
| **Histoencoder** | 81.6% | **1.003e-06** | - |
|
91 |
|
|
|
96 |
|
97 |
### Overview of 3D Wavelet Decomposition
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
|
100 |
+
Wavelets are oscillating functions localized in time and space, used to decompose a signal \( f(x, y, z) \) into multiple scales and orientations. 3D wavelet decomposition is a method well-suited for analyzing volumetric data, such as \(224 \times 224 \times 3\) images, by extracting localized information at different spatial scales.
|
101 |
|
102 |
+
We conducted small-scale experiments using Haar wavelets, considering a single decomposition scale and focusing on the "Approximation" of the image. Despite these limitations, training revealed some potential. We tested this idea on the PANDA subset benchmark and **Bony_wave** achieved a 83% accuracy on the test. For more details see https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
|
103 |
|
104 |
|
105 |
## Limitations and Biases
|
|
|
110 |
- This model may not be used for images other than **prostate histopathology** images as it has only been trained on this kind of data.
|
111 |
- This model shall not be used for diagnosis alone.
|
112 |
|
|
|
|