HPAI-BSC
/

Bony

Model card Files Files and versions Community

dariog commited on Feb 3

Commit

5c7fa52

verified ·

1 Parent(s): f382e0a

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -61

README.md CHANGED Viewed

@@ -2,10 +2,20 @@
 license: cc-by-nc-sa-4.0
 ---
-# Bony's Model Card
 Medium article: https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
 ## Model Description
 This XCiT (medium) model has been trained (from scratch) for prostate histopathology image analysis tasks, using images of size `224 × 224` pixels and 24 GPU H100. The XCiT architecture is a transformer model that uses cross-attention to process images, thereby improving performance compared to traditional CNN architectures. It was pre-trained on a large dataset using the DINO self-supervised training method.
@@ -74,7 +84,8 @@ The model achieved a classification accuracy of **81%** on the PANDA subset and
 | Model            | PANDA test subset (Accuracy) ↑ | DeepGleason (MSE) ↓ | SICAPv2 (MSE) ↓ |
 |------------------|---------------------------------|---------------------|-----------------|
-| **Bony**         | 81.2%                           | 2.934e-06           | **0.0008**      |
 | **Hibou**        | **83.1%**                       | 1.455e-06           | 0.10            |
 | **Histoencoder** | 81.6%                           | **1.003e-06**       | -               |
@@ -85,65 +96,10 @@ As previously mentioned, histopathology images are highly discontinuous, noisy,
 ### Overview of 3D Wavelet Decomposition
-3D wavelet decomposition is a method well-suited for analyzing volumetric data, such as \(224 \times 224 \times 3\) images, by extracting localized information at different spatial scales.
-Wavelets are oscillating functions localized in time and space, used to decompose a signal \( f(x, y, z) \) into multiple scales and orientations. The 3D wavelet transform is defined as:
-\[
-W_\psi f(j, \theta, x, y, z) = f \ast \psi_{j, \theta}(x, y, z),
-\]
-where \( \psi_{j, \theta} \) is a 3D wavelet with:
-- \( j \): a scale defining the spatial resolution,
-- \( \theta \): a specific spatial orientation,
-- \( \ast \): the 3D convolution operator.
-Common 3D wavelets include Morlet and Haar wavelets, which are effective for capturing directional variations.
-### 3D Scattering: Invariant Extension
-3D scattering is a method related to wavelet decomposition that produces representations invariant to transformations (e.g., translation, rotation). This ensures that histopathology images are invariant in the wavelet coefficient domain, thereby enabling better generalization.
-#### Step 1: Wavelet Decomposition
-A 3D wavelet is applied to extract first-scale coefficients:
-\[
-U_1(x, y, z) = |f \ast \psi_{j_1, \theta_1}(x, y, z)|.
-\]
-#### Step 2: Higher-Level Coefficient Extraction
-The coefficients \( U_1 \) are further transformed to capture secondary information:
-\[
-U_2(x, y, z) = |U_1 \ast \psi_{j_2, \theta_2}(x, y, z)|.
-\]
-This process can be repeated across multiple levels \( m \), forming a hierarchical cascade.
-It is worth noting that these wavelet operations share similarities with CNNs, where convolution layers are applied. This highlights that wavelet decomposition is foundational to computer vision based on CNNs.
-#### Step 3: Invariant Aggregation
-At each level, a non-linear operator is applied to create invariant representations (the following is an example of such an operation):
-\[
-S_m = \int |U_m| \, dx \, dy \, dz.
-\]
-These \( S_m \) coefficients can then be used for downstream tasks.
-Having introduced this idea, further testing is needed.
-### Testing the Idea
-We conducted small-scale experiments using Haar wavelets, considering a single decomposition scale and focusing on the "Approximation" of the image.
-Despite these limitations, training revealed some potential. We tested this idea on the PANDA subset benchmark and **Bony_wave** achieved a 83% accuracy on the test.
 ## Limitations and Biases
@@ -154,5 +110,3 @@ Although this model was trained for a specific prostate histopathology analysis
 - This model may not be used for images other than **prostate histopathology** images as it has only been trained on this kind of data.
 - This model shall not be used for diagnosis alone.
-## Conclusion
-The **XCiT** model pre-trained with **DINO** shows promising results for prostate histopathology image analysis compared to other models. By using the DINO method for self-supervised learning, the model learns robust representations without explicit supervision. However, it is important to continue validating this model on diverse datasets to ensure its effectiveness in various clinical contexts.

 license: cc-by-nc-sa-4.0
 ---
+# Bony & BonyWave Model Card
+Self-Supervised Vision Transformers for Prostate Histopathology Analysis
 Medium article: https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
+## Model Overview
+This repository hosts two variants of the XCiT-medium model trained for prostate histopathology image analysis:
+* Bony: Baseline XCiT model pre-trained with DINO.
+* BonyWave: Enhanced variant incorporating 3D wavelet decomposition for improved feature extraction.
+Both models process 224×224 RGB tiles and were trained on 2.8M image tiles from the PANDA dataset using 24× NVIDIA H100 GPUs.
 ## Model Description
 This XCiT (medium) model has been trained (from scratch) for prostate histopathology image analysis tasks, using images of size `224 × 224` pixels and 24 GPU H100. The XCiT architecture is a transformer model that uses cross-attention to process images, thereby improving performance compared to traditional CNN architectures. It was pre-trained on a large dataset using the DINO self-supervised training method.
 | Model            | PANDA test subset (Accuracy) ↑ | DeepGleason (MSE) ↓ | SICAPv2 (MSE) ↓ |
 |------------------|---------------------------------|---------------------|-----------------|
+| **Bony**         | 81.2%                           | 2.934e-06           | 8.0e-04      |
+| **BonyWave**     | 83.0%                           | 3.9e-04             | **7.9e-04**      |
 | **Hibou**        | **83.1%**                       | 1.455e-06           | 0.10            |
 | **Histoencoder** | 81.6%                           | **1.003e-06**       | -               |
 ### Overview of 3D Wavelet Decomposition
+Wavelets are oscillating functions localized in time and space, used to decompose a signal \( f(x, y, z) \) into multiple scales and orientations. 3D wavelet decomposition is a method well-suited for analyzing volumetric data, such as \(224 \times 224 \times 3\) images, by extracting localized information at different spatial scales.
+We conducted small-scale experiments using Haar wavelets, considering a single decomposition scale and focusing on the "Approximation" of the image. Despite these limitations, training revealed some potential. We tested this idea on the PANDA subset benchmark and **Bony_wave** achieved a 83% accuracy on the test. For more details see https://hpai-bsc.medium.com/medium-article-bony-744fa41b452d
 ## Limitations and Biases
 - This model may not be used for images other than **prostate histopathology** images as it has only been trained on this kind of data.
 - This model shall not be used for diagnosis alone.