mselmangokmen commited on
Commit
57191d4
·
verified ·
1 Parent(s): 0725b72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -98
README.md CHANGED
@@ -52,16 +52,14 @@ This model is a Vision Transformer adapted for neuropathology tasks, developed u
52
  ## Model Details
53
 
54
  * **Model Type:** Vision Transformer (ViT) for neuropathology.
55
- * **Developed by:** [https://caai.ai.uky.edu], [Optional: in collaboration with the University of Kentucky [Specific Department/Center, e.g., Sanders-Brown Center on Aging]]
56
- * **Model Date:** [PLACEHOLDER: YYYY-MM-DD of model training completion or publication]
57
- * **Base Model Architecture (if applicable):** [PLACEHOLDER: e.g., DINOv2 ViT-S/14, ViT-B/14. Specify if registers are used, e.g., "Based on ViT-B/14 with 4 register tokens."]
58
- * **Input:** Image (e.g., patches from whole slide images).
59
- * **Output:** Class token and patch tokens [Optional: and register tokens]. These can be used for various downstream tasks (e.g., classification, segmentation, similarity search).
60
- * **Embedding Dimension:** [PLACEHOLDER: Specify for your ViT variant, e.g., 384 for ViT-S, 768 for ViT-B]
61
- * **Patch Size:** [PLACEHOLDER: e.g., 14 or 16. Confirm based on your model, e.g., "14 for a ViT with patch size 14."]
62
  * **Image Size Compatibility:**
63
- * The model was trained on images/patches of size [PLACEHOLDER: e.g., 224x224].
64
- * For an input of [PLACEHOLDER: e.g., 224x224] with a patch size of [PLACEHOLDER: e.g., 14], this results in 1 class token + ([PLACEHOLDER: e.g., 224]/[PLACEHOLDER: e.g., 14])^2 = [PLACEHOLDER: e.g., 256] patch tokens [Optional: + X register tokens].
65
  * The model can accept larger images provided the image dimensions are multiples of the patch size. If not, cropping to the closest smaller multiple may occur.
66
  * **License:** [PLACEHOLDER: Reiterate license chosen in YAML, e.g., Apache 2.0. Add link to full license if custom or 'other'.]
67
  * **Repository:** [PLACEHOLDER: Link to your model repository (e.g., GitHub, Hugging Face Hub)]
@@ -92,101 +90,134 @@ This model is intended for research purposes in the field of neuropathology.
92
 
93
  ## How to Get Started with the Model
94
 
95
- [PLACEHOLDER: Provide code snippets for loading and using your model. If available on Hugging Face, show an example using `transformers` or `torch.hub.load`.]
96
 
97
- Example using Hugging Face `transformers` (adjust based on your actual model and task):
98
  ```python
99
- # Ensure you have the necessary libraries installed:
100
- # pip install transformers torch Pillow
101
 
102
- from transformers import AutoImageProcessor, AutoModel # Or AutoModelForImageClassification
103
  import torch
104
  from PIL import Image
105
- import requests # For fetching image from URL if needed
106
-
107
- # Make sure to replace with your actual model identifier on the Hugging Face Hub
108
- # For example: model_id = "your-username/your-model-name"
109
- model_id = "[PLACEHOLDER: your-hf-hub-username/your-model-name]"
110
-
111
- # Load the processor and model
112
- try:
113
- image_processor = AutoImageProcessor.from_pretrained(model_id)
114
- # If your model is for a specific task like classification, use the appropriate AutoModel class
115
- # model = AutoModelForImageClassification.from_pretrained(model_id)
116
- model = AutoModel.from_pretrained(model_id) # For feature extraction
117
- model.eval() # Set model to evaluation mode
118
- except Exception as e:
119
- print(f"Error loading model or processor from Hugging Face Hub: {e}")
120
- print(f"Please ensure '{model_id}' is a valid model identifier and you have an internet connection.")
121
- # Fallback for placeholder if model_id is not set for demonstration
122
- if model_id == "[PLACEHOLDER: your-hf-hub-username/your-model-name]":
123
- print("Using a dummy model structure for demonstration as placeholder ID is used.")
124
- # This is a dummy structure, not a functional model
125
- from transformers import ViTConfig, ViTModel
126
- config = ViTConfig(image_size=224, patch_size=14, num_labels=3, hidden_size=192, num_hidden_layers=12, num_attention_heads=3) # Minimal ViT-Tiny like
127
- model = ViTModel(config) # Or ViTForImageClassification(config)
128
- # A dummy processor
129
- class DummyProcessor:
130
- def __init__(self):
131
- self.size = {"height": 224, "width": 224}
132
- def __call__(self, images, return_tensors=None):
133
- # Simplified dummy preprocessing
134
- return {"pixel_values": torch.randn(1, 3, self.size['height'], self.size['width'])}
135
- image_processor = DummyProcessor()
136
-
137
-
138
- # Example: Load an image
139
- # Option 1: From a local path
140
- image_path = "[PLACEHOLDER: path/to/your/neuropathology_image.png]"
141
- # Option 2: From a URL (example)
142
- # image_url = "[https://placehold.co/224x224/E6E6FA/800080?text=Sample](https://placehold.co/224x224/E6E6FA/800080?text=Sample)\nImage" # Lilac background, purple text
143
- image_url = "[https://placehold.co/224x224/cccccc/333333?text=Sample+Patch](https://placehold.co/224x224/cccccc/333333?text=Sample+Patch)"
144
-
145
-
146
- try:
147
- # image = Image.open(image_path).convert("RGB")
148
- # Uncomment above line and comment below if using local path
149
- image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
150
- except FileNotFoundError:
151
- print(f"Image file not found at: {image_path}. Using a dummy image.")
152
- image = Image.new('RGB', (image_processor.size['height'], image_processor.size['width']), color = 'skyblue')
153
- except Exception as e:
154
- print(f"Error loading image: {e}. Using a dummy image.")
155
- image = Image.new('RGB', (224, 224), color = 'skyblue') # Fallback size
156
-
157
- # Preprocess the image
158
- try:
159
- inputs = image_processor(images=image, return_tensors="pt")
160
- except Exception as e:
161
- print(f"Error during image processing: {e}")
162
- inputs = {"pixel_values": torch.randn(1, 3, 224, 224)} # Fallback input
163
-
164
- # Perform inference
165
- with torch.no_grad():
166
- try:
167
  outputs = model(**inputs)
168
- # For feature extraction (AutoModel):
169
- last_hidden_states = outputs.last_hidden_state
170
- class_token_embedding = last_hidden_states[:, 0] # CLS token embedding
171
- patch_embeddings = last_hidden_states[:, 1:] # Patch token embeddings (excluding CLS)
172
- print("Class token embedding shape:", class_token_embedding.shape)
173
- print("Patch embeddings shape:", patch_embeddings.shape)
174
-
175
- # For classification (AutoModelForImageClassification):
176
- # if hasattr(outputs, 'logits'):
177
- # logits = outputs.logits
178
- # predicted_class_idx = logits.argmax(-1).item()
179
- # # Assuming your model config has id2label mapping
180
- # if hasattr(model.config, 'id2label') and model.config.id2label:
181
- # print("Predicted class:", model.config.id2label[predicted_class_idx])
182
- # else:
183
- # print("Predicted class index:", predicted_class_idx)
184
- # else:
185
- # print("Model output does not contain logits. Check if you are using the correct AutoModel class for your task.")
186
-
187
- except Exception as e:
188
- print(f"Error during model inference: {e}")
189
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  ```
191
 
192
  ## Training Data
 
52
  ## Model Details
53
 
54
  * **Model Type:** Vision Transformer (ViT) for neuropathology.
55
+ * **Developed by:** Center for Applied Artificial Intelligence
56
+ * **Model Date:** 05/05/2025
57
+ * **Base Model Architecture :** DINOv2-Giant (Vit-G/14)
58
+ * **Input:** Image (224x224).
59
+ * **Embedding Dimension:** 1536
60
+ * **Patch Size:** 14
 
61
  * **Image Size Compatibility:**
62
+ * The model was trained on images/patches of size 224x224.
 
63
  * The model can accept larger images provided the image dimensions are multiples of the patch size. If not, cropping to the closest smaller multiple may occur.
64
  * **License:** [PLACEHOLDER: Reiterate license chosen in YAML, e.g., Apache 2.0. Add link to full license if custom or 'other'.]
65
  * **Repository:** [PLACEHOLDER: Link to your model repository (e.g., GitHub, Hugging Face Hub)]
 
90
 
91
  ## How to Get Started with the Model
92
 
 
93
 
94
+ This model can extract embeddings from pathology images using three different approaches: with an image processor for standardized preprocessing, without explicit resizing for preserving original image dimensions, or with forced 224×224 resizing for consistent inputs. These flexible extraction methods accommodate various usage scenarios while ensuring proper normalization, allowing researchers to choose the approach that best fits their specific data characteristics and research requirements.
95
  ```python
 
 
96
 
 
97
  import torch
98
  from PIL import Image
99
+ from transformers import AutoModel, AutoImageProcessor
100
+ from torchvision import transforms
101
+
102
+ def get_embeddings_with_processor(image_path, model_path, processor_path):
103
+ """
104
+ Extract embeddings using a HuggingFace image processor.
105
+ This approach handles normalization and resizing automatically.
106
+
107
+ Args:
108
+ image_path: Path to the image file
109
+ model_path: Path to the model directory
110
+ processor_path: Path to the processor config directory
111
+
112
+ Returns:
113
+ Image embeddings from the model
114
+ """
115
+ # Load model
116
+ model = AutoModel.from_pretrained(model_path)
117
+ model.eval()
118
+
119
+ # Load processor from config
120
+ image_processor = AutoImageProcessor.from_pretrained(processor_path)
121
+
122
+ # Process the image
123
+ with torch.no_grad():
124
+ image = Image.open(image_path).convert('RGB')
125
+ inputs = image_processor(images=image, return_tensors="pt")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  outputs = model(**inputs)
127
+ embeddings = outputs.last_hidden_state[:, 0, :]
128
+
129
+ return embeddings
130
+
131
+ def get_embeddings_direct(image_path, model_path, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
132
+ """
133
+ Extract embeddings directly without an image processor.
134
+ This approach works with various image resolutions since transformers handle
135
+ different input sizes by design.
136
+
137
+ Args:
138
+ image_path: Path to the image file
139
+ model_path: Path to the model directory
140
+ mean: Normalization mean values
141
+ std: Normalization standard deviation values
142
+
143
+ Returns:
144
+ Image embeddings from the model
145
+ """
146
+ # Load model
147
+ model = AutoModel.from_pretrained(model_path)
148
+ model.eval()
149
+
150
+ # Define transformation - just converting to tensor and normalizing
151
+ transform = transforms.Compose([
152
+ transforms.ToTensor(),
153
+ transforms.Normalize(mean=mean, std=std)
154
+ ])
155
+
156
+ # Process the image
157
+ with torch.no_grad():
158
+ # Open image and convert to RGB
159
+ image = Image.open(image_path).convert('RGB')
160
+ # Convert image to tensor
161
+ image_tensor = transform(image).unsqueeze(0) # Add batch dimension
162
+ # Feed to model
163
+ outputs = model(pixel_values=image_tensor)
164
+ # Get embeddings
165
+ embeddings = outputs.last_hidden_state[:, 0, :]
166
+
167
+ return embeddings
168
+
169
+ def get_embeddings_resized(image_path, model_path, size=(224, 224), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
170
+ """
171
+ Extract embeddings with explicit resizing to 224x224.
172
+ This approach ensures consistent input size regardless of original image dimensions.
173
+
174
+ Args:
175
+ image_path: Path to the image file
176
+ model_path: Path to the model directory
177
+ size: Target size for resizing (default: 224x224)
178
+ mean: Normalization mean values
179
+ std: Normalization standard deviation values
180
+
181
+ Returns:
182
+ Image embeddings from the model
183
+ """
184
+ # Load model
185
+ model = AutoModel.from_pretrained(model_path)
186
+ model.eval()
187
+
188
+ # Define transformation with explicit resize
189
+ transform = transforms.Compose([
190
+ transforms.Resize(size, interpolation=transforms.InterpolationMode.BICUBIC),
191
+ transforms.ToTensor(),
192
+ transforms.Normalize(mean=mean, std=std)
193
+ ])
194
+
195
+ # Process the image
196
+ with torch.no_grad():
197
+ image = Image.open(image_path).convert('RGB')
198
+ image_tensor = transform(image).unsqueeze(0) # Add batch dimension
199
+ outputs = model(pixel_values=image_tensor)
200
+ embeddings = outputs.last_hidden_state[:, 0, :]
201
+
202
+ return embeddings
203
+
204
+ # Example usage
205
+ if __name__ == "__main__":
206
+ image_path = "test.jpg"
207
+ model_path = "outputs/training_test_3/teacher_checkpoints/iter_40"
208
+ processor_path = "processor_config.json" # Directory containing preprocessor_config.json
209
+
210
+ # Method 1: Using image processor (recommended for consistency)
211
+ embeddings1 = get_embeddings_with_processor(image_path, model_path, processor_path)
212
+ print('Embedding shape (with processor):', embeddings1.shape)
213
+
214
+ # Method 2: Direct approach without resizing (works with various resolutions)
215
+ embeddings2 = get_embeddings_direct(image_path, model_path)
216
+ print('Embedding shape (direct):', embeddings2.shape)
217
+
218
+ # Method 3: With explicit resize to 224x224
219
+ embeddings3 = get_embeddings_resized(image_path, model_path)
220
+ print('Embedding shape (resized):', embeddings3.shape)
221
  ```
222
 
223
  ## Training Data