IBI-CAAI
/

NP-TEST-0

@@ -52,16 +52,14 @@ This model is a Vision Transformer adapted for neuropathology tasks, developed u
 ## Model Details
 * **Model Type:** Vision Transformer (ViT) for neuropathology.
-* **Developed by:** [https://caai.ai.uky.edu], [Optional: in collaboration with the University of Kentucky [Specific Department/Center, e.g., Sanders-Brown Center on Aging]]
-* **Model Date:** [PLACEHOLDER: YYYY-MM-DD of model training completion or publication]
-* **Base Model Architecture (if applicable):** [PLACEHOLDER: e.g., DINOv2 ViT-S/14, ViT-B/14. Specify if registers are used, e.g., "Based on ViT-B/14 with 4 register tokens."]
-* **Input:** Image (e.g., patches from whole slide images).
-* **Output:** Class token and patch tokens [Optional: and register tokens]. These can be used for various downstream tasks (e.g., classification, segmentation, similarity search).
-* **Embedding Dimension:** [PLACEHOLDER: Specify for your ViT variant, e.g., 384 for ViT-S, 768 for ViT-B]
-* **Patch Size:** [PLACEHOLDER: e.g., 14 or 16. Confirm based on your model, e.g., "14 for a ViT with patch size 14."]
 * **Image Size Compatibility:**
-    * The model was trained on images/patches of size [PLACEHOLDER: e.g., 224x224].
-    * For an input of [PLACEHOLDER: e.g., 224x224] with a patch size of [PLACEHOLDER: e.g., 14], this results in 1 class token + ([PLACEHOLDER: e.g., 224]/[PLACEHOLDER: e.g., 14])^2 = [PLACEHOLDER: e.g., 256] patch tokens [Optional: + X register tokens].
     * The model can accept larger images provided the image dimensions are multiples of the patch size. If not, cropping to the closest smaller multiple may occur.
 * **License:** [PLACEHOLDER: Reiterate license chosen in YAML, e.g., Apache 2.0. Add link to full license if custom or 'other'.]
 * **Repository:** [PLACEHOLDER: Link to your model repository (e.g., GitHub, Hugging Face Hub)]
@@ -92,101 +90,134 @@ This model is intended for research purposes in the field of neuropathology.
 ## How to Get Started with the Model
-[PLACEHOLDER: Provide code snippets for loading and using your model. If available on Hugging Face, show an example using `transformers` or `torch.hub.load`.]
-Example using Hugging Face `transformers` (adjust based on your actual model and task):
 ```python
-# Ensure you have the necessary libraries installed:
-# pip install transformers torch Pillow
-from transformers import AutoImageProcessor, AutoModel # Or AutoModelForImageClassification
 import torch
 from PIL import Image
-import requests # For fetching image from URL if needed
-# Make sure to replace with your actual model identifier on the Hugging Face Hub
-# For example: model_id = "your-username/your-model-name"
-model_id = "[PLACEHOLDER: your-hf-hub-username/your-model-name]"
-# Load the processor and model
-try:
-    image_processor = AutoImageProcessor.from_pretrained(model_id)
-    # If your model is for a specific task like classification, use the appropriate AutoModel class
-    # model = AutoModelForImageClassification.from_pretrained(model_id)
-    model = AutoModel.from_pretrained(model_id) # For feature extraction
-    model.eval() # Set model to evaluation mode
-except Exception as e:
-    print(f"Error loading model or processor from Hugging Face Hub: {e}")
-    print(f"Please ensure '{model_id}' is a valid model identifier and you have an internet connection.")
-    # Fallback for placeholder if model_id is not set for demonstration
-    if model_id == "[PLACEHOLDER: your-hf-hub-username/your-model-name]":
-        print("Using a dummy model structure for demonstration as placeholder ID is used.")
-        # This is a dummy structure, not a functional model
-        from transformers import ViTConfig, ViTModel
-        config = ViTConfig(image_size=224, patch_size=14, num_labels=3, hidden_size=192, num_hidden_layers=12, num_attention_heads=3) # Minimal ViT-Tiny like
-        model = ViTModel(config) # Or ViTForImageClassification(config)
-        # A dummy processor
-        class DummyProcessor:
-            def __init__(self):
-                self.size = {"height": 224, "width": 224}
-            def __call__(self, images, return_tensors=None):
-                # Simplified dummy preprocessing
-                return {"pixel_values": torch.randn(1, 3, self.size['height'], self.size['width'])}
-        image_processor = DummyProcessor()
-# Example: Load an image
-# Option 1: From a local path
-image_path = "[PLACEHOLDER: path/to/your/neuropathology_image.png]"
-# Option 2: From a URL (example)
-# image_url = "[https://placehold.co/224x224/E6E6FA/800080?text=Sample](https://placehold.co/224x224/E6E6FA/800080?text=Sample)\nImage" # Lilac background, purple text
-image_url = "[https://placehold.co/224x224/cccccc/333333?text=Sample+Patch](https://placehold.co/224x224/cccccc/333333?text=Sample+Patch)"
-try:
-    # image = Image.open(image_path).convert("RGB")
-    # Uncomment above line and comment below if using local path
-    image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
-except FileNotFoundError:
-    print(f"Image file not found at: {image_path}. Using a dummy image.")
-    image = Image.new('RGB', (image_processor.size['height'], image_processor.size['width']), color = 'skyblue')
-except Exception as e:
-    print(f"Error loading image: {e}. Using a dummy image.")
-    image = Image.new('RGB', (224, 224), color = 'skyblue') # Fallback size
-# Preprocess the image
-try:
-    inputs = image_processor(images=image, return_tensors="pt")
-except Exception as e:
-    print(f"Error during image processing: {e}")
-    inputs = {"pixel_values": torch.randn(1, 3, 224, 224)} # Fallback input
-# Perform inference
-with torch.no_grad():
-    try:
         outputs = model(**inputs)
-        # For feature extraction (AutoModel):
-        last_hidden_states = outputs.last_hidden_state
-        class_token_embedding = last_hidden_states[:, 0] # CLS token embedding
-        patch_embeddings = last_hidden_states[:, 1:]    # Patch token embeddings (excluding CLS)
-        print("Class token embedding shape:", class_token_embedding.shape)
-        print("Patch embeddings shape:", patch_embeddings.shape)
-        # For classification (AutoModelForImageClassification):
-        # if hasattr(outputs, 'logits'):
-        #     logits = outputs.logits
-        #     predicted_class_idx = logits.argmax(-1).item()
-        #     # Assuming your model config has id2label mapping
-        #     if hasattr(model.config, 'id2label') and model.config.id2label:
-        #         print("Predicted class:", model.config.id2label[predicted_class_idx])
-        #     else:
-        #         print("Predicted class index:", predicted_class_idx)
-        # else:
-        #     print("Model output does not contain logits. Check if you are using the correct AutoModel class for your task.")
-    except Exception as e:
-        print(f"Error during model inference: {e}")
 ```
 ## Training Data

 ## Model Details
 * **Model Type:** Vision Transformer (ViT) for neuropathology.
+* **Developed by:** Center for Applied Artificial Intelligence
+* **Model Date:** 05/05/2025
+* **Base Model Architecture :** DINOv2-Giant (Vit-G/14)
+* **Input:** Image (224x224).
+* **Embedding Dimension:** 1536
+* **Patch Size:** 14
 * **Image Size Compatibility:**
+    * The model was trained on images/patches of size 224x224.
     * The model can accept larger images provided the image dimensions are multiples of the patch size. If not, cropping to the closest smaller multiple may occur.
 * **License:** [PLACEHOLDER: Reiterate license chosen in YAML, e.g., Apache 2.0. Add link to full license if custom or 'other'.]
 * **Repository:** [PLACEHOLDER: Link to your model repository (e.g., GitHub, Hugging Face Hub)]
 ## How to Get Started with the Model
+This model can extract embeddings from pathology images using three different approaches: with an image processor for standardized preprocessing, without explicit resizing for preserving original image dimensions, or with forced 224×224 resizing for consistent inputs. These flexible extraction methods accommodate various usage scenarios while ensuring proper normalization, allowing researchers to choose the approach that best fits their specific data characteristics and research requirements.
 ```python
 import torch
 from PIL import Image
+from transformers import AutoModel, AutoImageProcessor
+from torchvision import transforms
+def get_embeddings_with_processor(image_path, model_path, processor_path):
+    """
+    Extract embeddings using a HuggingFace image processor.
+    This approach handles normalization and resizing automatically.
+    Args:
+        image_path: Path to the image file
+        model_path: Path to the model directory
+        processor_path: Path to the processor config directory
+    Returns:
+        Image embeddings from the model
+    """
+    # Load model
+    model = AutoModel.from_pretrained(model_path)
+    model.eval()
+    # Load processor from config
+    image_processor = AutoImageProcessor.from_pretrained(processor_path)
+    # Process the image
+    with torch.no_grad():
+        image = Image.open(image_path).convert('RGB')
+        inputs = image_processor(images=image, return_tensors="pt")
         outputs = model(**inputs)
+        embeddings = outputs.last_hidden_state[:, 0, :]
+    return embeddings
+def get_embeddings_direct(image_path, model_path, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
+    """
+    Extract embeddings directly without an image processor.
+    This approach works with various image resolutions since transformers handle
+    different input sizes by design.
+    Args:
+        image_path: Path to the image file
+        model_path: Path to the model directory
+        mean: Normalization mean values
+        std: Normalization standard deviation values
+    Returns:
+        Image embeddings from the model
+    """
+    # Load model
+    model = AutoModel.from_pretrained(model_path)
+    model.eval()
+    # Define transformation - just converting to tensor and normalizing
+    transform = transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize(mean=mean, std=std)
+    ])
+    # Process the image
+    with torch.no_grad():
+        # Open image and convert to RGB
+        image = Image.open(image_path).convert('RGB')
+        # Convert image to tensor
+        image_tensor = transform(image).unsqueeze(0)  # Add batch dimension
+        # Feed to model
+        outputs = model(pixel_values=image_tensor)
+        # Get embeddings
+        embeddings = outputs.last_hidden_state[:, 0, :]
+    return embeddings
+def get_embeddings_resized(image_path, model_path, size=(224, 224), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):
+    """
+    Extract embeddings with explicit resizing to 224x224.
+    This approach ensures consistent input size regardless of original image dimensions.
+    Args:
+        image_path: Path to the image file
+        model_path: Path to the model directory
+        size: Target size for resizing (default: 224x224)
+        mean: Normalization mean values
+        std: Normalization standard deviation values
+    Returns:
+        Image embeddings from the model
+    """
+    # Load model
+    model = AutoModel.from_pretrained(model_path)
+    model.eval()
+    # Define transformation with explicit resize
+    transform = transforms.Compose([
+        transforms.Resize(size, interpolation=transforms.InterpolationMode.BICUBIC),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=mean, std=std)
+    ])
+    # Process the image
+    with torch.no_grad():
+        image = Image.open(image_path).convert('RGB')
+        image_tensor = transform(image).unsqueeze(0)  # Add batch dimension
+        outputs = model(pixel_values=image_tensor)
+        embeddings = outputs.last_hidden_state[:, 0, :]
+    return embeddings
+# Example usage
+if __name__ == "__main__":
+    image_path = "test.jpg"
+    model_path = "outputs/training_test_3/teacher_checkpoints/iter_40"
+    processor_path = "processor_config.json"  # Directory containing preprocessor_config.json
+    # Method 1: Using image processor (recommended for consistency)
+    embeddings1 = get_embeddings_with_processor(image_path, model_path, processor_path)
+    print('Embedding shape (with processor):', embeddings1.shape)
+    # Method 2: Direct approach without resizing (works with various resolutions)
+    embeddings2 = get_embeddings_direct(image_path, model_path)
+    print('Embedding shape (direct):', embeddings2.shape)
+    # Method 3: With explicit resize to 224x224
+    embeddings3 = get_embeddings_resized(image_path, model_path)
+    print('Embedding shape (resized):', embeddings3.shape)
 ```
 ## Training Data