File size: 3,873 Bytes
28c6cd4 b40b997 28c6cd4 a1a7ffd 28c6cd4 103b546 25a1afa 26242e7 71acd6e 25a1afa 71acd6e 25a1afa 71acd6e 25a1afa 71acd6e 0186c0e 71acd6e bbebd05 0186c0e 71acd6e 432ab4a 25a1afa b40b997 432ab4a b40b997 432ab4a 25a1afa 891e51e aab80ad 103b546 28c6cd4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width" />
<title>MLTest Demo</title>
<link rel="stylesheet" href="style.css" />
<link href="https://fonts.googleapis.com/css2?family=Source+Sans+Pro:ital,wght@0,200;0,300;0,400;0,600;0,700;0,900;1,200;1,300;1,400;1,600;1,700;1,900&display=swap" rel="stylesheet">
<link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:wght@400;600;700&display=swap" rel="stylesheet">
</head>
<body>
<div class="container">
<div>
<h1>MLTest</h1>
<p>
This is a demo of MLTest on the dataset
<a href="https://huggingface.co/datasets/marmal88/skin_cancer"><code>marmal88/skin_cancer</code></a>.
Each image is labeled with one of three different kinds of cancers.
</p>
<p>
The model has been trained on five models: two variants of Swin Transformers, ViT, ResNet, and BEiT. The test results for each model can
be inspected in the dashboard below.
</p>
<p>
<b>Performance tests</b>:
in order to measure how well the model performs, we compute common performance metrics like accuracy,
precision, recall, F1 score, and more.
</p>
<p>
<b>Failure clusters</b>:
these clusters give meaningful insights when the model is failing and can be inspected in the "Failure Clusters" tab.
These failure clusters are automatically detected
for different combinations of metadata.
For example, the BEiT transformer has a significantly lower accuracy on images taken of cancers of the back with class label <code>0</code>.
</p>
<p>
<b>Robustness</b>: these tests help ML developers evaluate how well their model performs under different conditions.
These conditions could include different levels of brightness, compression, and many other types of interference.
</p><p>
The following robustness tests were enabled for this test case:
</p>
<ul>
<li>Brightness</li>
<li>CompressImage</li>
<li>Contrast</li>
<li>DarkSpots</li>
<li>GaussianBlur</li>
<li>GaussianNoise</li>
<li>Glare</li>
<li>GlassBlur</li>
<li>HorizontalFlip</li>
<li>MedianBlur</li>
<li>MotionBlur</li>
<li>OilSpots</li>
<li>Perspective</li>
<li>VerticalFlip</li>
</ul>
<p>
The full list of transforms supported by MLTest can be found in the <a target="_blank" href="https://docs.lakera.ai/configuration/robustness">documentation</a>.
</p>
<p>
<b>Fairness tests</b>: these tests measure how fair your model is. That means, whether its performance is dependent
on a protected attribute of a person. In this dataset, the age and gender of a subject may be considered
protected attributes.
</p>
<p>
We used two types of fairness tests on the age and gender of a person.
The <code>Equalized Odds</code> test checks that true positive and false positive rates are equal amongst protected attributes.
The <code>Predictive Equality</code> test checks that the false positive rates are equal amongst protected attributes.
</p>
<p>
More fairness tests supported by MLTest can be found in the <a target="_blank" href="https://docs.lakera.ai/configuration/fairness">documentation</a>.
</p>
</div>
<iframe src="https://hf.lakera.ai/projects/skin_cancer_run2"></iframe>
</div>
</body>
</html>
|