PavanKumarAmbadapudi
/

DiabeticRetinopathy_Hybrid-ViT

+# 🏥 Diabetic Retinopathy Severity Classification
+This model is a **Hybrid Vision Transformer (ViT) with EfficientNet B0** as the backbone. It is trained to classify the severity of **Diabetic Retinopathy** into different stages.
+## 📌 Model Overview
+- **Backbone**: EfficientNet B0 (Feature Extractor)
+- **Head**: Vision Transformer (ViT) for Classification
+- **Input Size**: 224x224 (RGB Images)
+- **Output Classes**:
+  - 0: No Diabetic Retinopathy
+  - 1: Mild
+  - 2: Moderate
+  - 3: Severe
+  - 4: Proliferative Diabetic Retinopathy
+---
+## 🚀 How to Use This Model
+### **1️⃣  Download the Model**
+Make sure you have **PyTorch** and **Torchvision** installed:
+Clone the repository and navigate to it:
+```bash
+!git clone https://huggingface.co/PavanKumarAmbadapudi/DiabeticRetinopathy_Hybrid-ViT
+cd DiabeticRetinopathy_Hybrid-ViT
+```
+Or manually download the files:
+Hybrid_ViT.pth, model.py
+### **2️⃣ Load the Model in Python**
+```python
+import torch
+from model import CNNViT
+model = CNNViT(num_classes=5)
+model.load_state_dict(torch.load("Hybrid_ViT.pth", map_location=torch.device('cpu')))
+model.eval()
+```
+### **3️⃣ Perform Inference**
+To make predictions on an image:
+```python
+from PIL import Image
+import torchvision.transforms as transforms
+def map_prediction(prediction):
+    mapping = {
+        0: "No DR",
+        1: "Mild",
+        2: "Moderate",
+        3: "Severe",
+        4: "Proliferative DR"
+    }
+    return mapping.get(prediction, "Unknown")
+image_path = 'Path_to_Your_Image'
+def getTransformations(image_path):
+    transform = transforms.Compose([
+    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
+    transforms.RandomHorizontalFlip(),
+    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Use RGB mean and std
+    ])
+    image = Image.open(image_path).convert("RGB")
+    return transform(image).unsqueeze(0)
+image_tensor = getTransformations(image_path)
+def predict_model_Hybrid(model, image_tensor):
+        with torch.no_grad():
+            outputs = model(image_tensor)
+            probabilities = torch.softmax(outputs, dim=1)
+            predicted_classes = probabilities.argmax(dim=1).item()
+            confidences = probabilities.max(dim=1).values.item()
+        model_predictions =  {"label": map_prediction(predicted_classes), "confidence": confidences}
+        return model_predictions
+print("Hybrid ViT ", predict_model_Hybrid(model, image_tensor))
+```
+## 📊 Training Details
+This model was trained on **APTOS 2019 Blindness Detection** dataset using **5-Fold Cross-Validation** to ensure better generalization. The training process involved EfficientNet B0 as a feature extractor combined with a Vision Transformer (ViT) classification head.
+## 🛠️ Hyperparameters
+| Parameter     | Value |
+|--------------|-------|
+| **Image Size**  | 224x224 |
+| **Batch Size**  | 32 |
+| **Epochs**      | 5 |
+| **K-Folds**     | 5 |
+| **Learning Rate** | 1e-4 |
+| **Optimizer**  | Adam |
+| **Scheduler**  | StepLR (Step=10, Gamma=0.5) |
+| **Loss Function** | CrossEntropyLoss |
+| **Device** | `CUDA` (if available) |
+## 📬 Contact
+For any queries, reach out to me at:
+📧 [email protected]

model.py ADDED Viewed

	@@ -0,0 +1,64 @@

+import torch
+import torch.nn as nn
+import timm
+class TransformerBlock(nn.Module):
+    def __init__(self, embed_dim=1280, num_heads=8, ff_dim=3072, dropout=0.1):
+        super(TransformerBlock, self).__init__()
+        self.attn = nn.MultiheadAttention(embed_dim, num_heads, dropout=dropout)
+        self.norm1 = nn.LayerNorm(embed_dim)
+        self.norm2 = nn.LayerNorm(embed_dim)
+        self.ffn = nn.Sequential(
+            nn.Linear(embed_dim, ff_dim),
+            nn.GELU(),
+            nn.Linear(ff_dim, embed_dim),
+            nn.Dropout(dropout)
+        )
+    def forward(self, x):
+        x = x.unsqueeze(1)
+        x = x.permute(1, 0, 2)
+        attn_output, _ = self.attn(x, x, x)
+        x = self.norm1(x + attn_output)
+        ffn_output = self.ffn(x)
+        x = self.norm2(x + ffn_output)
+        x = x.permute(1, 0, 2)
+        return x
+class EfficientNetBackbone(nn.Module):
+    def __init__(self):
+        super(EfficientNetBackbone, self).__init__()
+        self.model = timm.create_model('efficientnet_b0', pretrained=True, num_classes=0, global_pool='avg')
+        self.out_features = 1280
+    def forward(self, x):
+        x = self.model(x)
+        return x
+class CNNViT(nn.Module):
+    def __init__(self, num_classes=5):
+        super(CNNViT, self).__init__()
+        self.cnn_backbone = EfficientNetBackbone()
+        self.transformer = TransformerBlock(embed_dim=1280, num_heads=8, ff_dim=3072)
+        self.fc = nn.Sequential(
+            nn.Linear(1280, 512),
+            nn.ReLU(),
+            nn.Dropout(0.3),
+            nn.Linear(512, 256),
+            nn.ReLU(),
+            nn.Dropout(0.3),
+            nn.Linear(256, num_classes)
+        )
+    def forward(self, x):
+        x = self.cnn_backbone(x)
+        x = self.transformer(x)
+        x = x.squeeze(1)
+        x = self.fc(x)
+        return x
+model_Hybrid = CNNViT(num_classes=5)