|
<!DOCTYPE html> |
|
<html> |
|
<head> |
|
<meta charset="utf-8" /> |
|
<meta name="viewport" content="width=device-width" /> |
|
<title>MLTest Demo</title> |
|
<link rel="stylesheet" href="style.css" /> |
|
<link href="https://fonts.googleapis.com/css2?family=Source+Sans+Pro:ital,wght@0,200;0,300;0,400;0,600;0,700;0,900;1,200;1,300;1,400;1,600;1,700;1,900&display=swap" rel="stylesheet"> |
|
<link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:wght@400;600;700&display=swap" rel="stylesheet"> |
|
</head> |
|
<body> |
|
<div class="container"> |
|
<div> |
|
<h1>MLTest</h1> |
|
<p> |
|
This is a demo of MLTest on the dataset |
|
<a href="https://huggingface.co/datasets/marmal88/skin_cancer"><code>marmal88/skin_cancer</code></a>. |
|
Each image is labeled with one of three different kinds of cancers. |
|
</p> |
|
<p> |
|
The model has been trained on five models: two variants of Swin Transformers, ViT, ResNet, and BEiT. The test results for each model can |
|
be inspected in the dashboard below. |
|
</p> |
|
<p> |
|
<b>Performance tests</b>: |
|
in order to measure how well the model performs, we compute common performance metrics like accuracy, |
|
precision, recall, F1 score, and more. |
|
</p> |
|
<p> |
|
<b>Failure clusters</b>: |
|
these clusters give meaningful insights when the model is failing and can be inspected in the "Failure Clusters" tab. |
|
These failure clusters are automatically detected |
|
for different combinations of metadata. |
|
For example, the BEiT transformer has a significantly lower accuracy on images taken of cancers of the back with class label <code>0</code>. |
|
</p> |
|
<p> |
|
<b>Robustness</b>: these tests help ML developers evaluate how well their model performs under different conditions. |
|
These conditions could include different levels of brightness, compression, and many other types of interference. |
|
</p><p> |
|
The following robustness tests were enabled for this test case: |
|
</p> |
|
<ul> |
|
<li>Brightness</li> |
|
<li>CompressImage</li> |
|
<li>Contrast</li> |
|
<li>DarkSpots</li> |
|
<li>GaussianBlur</li> |
|
<li>GaussianNoise</li> |
|
<li>Glare</li> |
|
<li>GlassBlur</li> |
|
<li>HorizontalFlip</li> |
|
<li>MedianBlur</li> |
|
<li>MotionBlur</li> |
|
<li>OilSpots</li> |
|
<li>Perspective</li> |
|
<li>VerticalFlip</li> |
|
</ul> |
|
|
|
|
|
<p> |
|
The full list of transforms supported by MLTest can be found in the <a target="_blank" href="https://docs.lakera.ai/configuration/robustness">documentation</a>. |
|
</p> |
|
<p> |
|
<b>Fairness tests</b>: these tests measure how fair your model is. That means, whether its performance is dependent |
|
on a protected attribute of a person. In this dataset, the age and gender of a subject may be considered |
|
protected attributes. |
|
</p> |
|
<p> |
|
We used two types of fairness tests on the age and gender of a person. |
|
The <code>Equalized Odds</code> test checks that true positive and false positive rates are equal amongst protected attributes. |
|
The <code>Predictive Equality</code> test checks that the false positive rates are equal amongst protected attributes. |
|
</p> |
|
<p> |
|
More fairness tests supported by MLTest can be found in the <a target="_blank" href="https://docs.lakera.ai/configuration/fairness">documentation</a>. |
|
</p> |
|
</div> |
|
|
|
<iframe src="https://hf.lakera.ai/projects/skin_cancer_run2"></iframe> |
|
</div> |
|
</body> |
|
</html> |
|
|