Sam Heutmaker
commited on
Commit
·
4ffc59c
1
Parent(s):
fb437aa
fix graphs
Browse files
README.md
CHANGED
|
@@ -51,13 +51,14 @@ Performance metrics on our internal evaluation set:
|
|
| 51 |
|
| 52 |
### Benchmark Visualizations
|
| 53 |
|
| 54 |
-
<
|
| 55 |
-
<img src="./assets/judge-score.png" alt="Average Judge Score Comparison" width="
|
| 56 |
-
<img src="./assets/rouge-1.png" alt="ROUGE-1 Score Comparison" width="
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
<img src="./assets/
|
| 60 |
-
|
|
|
|
| 61 |
|
| 62 |
FP8 quantization showed no measurable quality degradation compared to bf16 precision.
|
| 63 |
|
|
@@ -75,9 +76,7 @@ GrassData/ClipTagger-12b delivers frontier-quality performance at a fraction of
|
|
| 75 |
|
| 76 |
*Cost calculations based on 700 input tokens and 250 output tokens per generation.
|
| 77 |
|
| 78 |
-
<
|
| 79 |
-
<img src="./assets/cost.png" alt="Cost Comparison Per 1 Million Generations" width="80%" />
|
| 80 |
-
</div>
|
| 81 |
|
| 82 |
ClipTagger-12b offers **15x cost savings** compared to GPT-4.1 and **17x cost savings** compared to Claude 4 Sonnet, while maintaining comparable quality metrics.
|
| 83 |
|
|
|
|
| 51 |
|
| 52 |
### Benchmark Visualizations
|
| 53 |
|
| 54 |
+
<p align="center">
|
| 55 |
+
<img src="./assets/judge-score.png" alt="Average Judge Score Comparison" width="49%" />
|
| 56 |
+
<img src="./assets/rouge-1.png" alt="ROUGE-1 Score Comparison" width="49%" />
|
| 57 |
+
</p>
|
| 58 |
+
<p align="center">
|
| 59 |
+
<img src="./assets/rouge-L.png" alt="ROUGE-L Score Comparison" width="49%" />
|
| 60 |
+
<img src="./assets/bleu.png" alt="BLEU Score Comparison" width="49%" />
|
| 61 |
+
</p>
|
| 62 |
|
| 63 |
FP8 quantization showed no measurable quality degradation compared to bf16 precision.
|
| 64 |
|
|
|
|
| 76 |
|
| 77 |
*Cost calculations based on 700 input tokens and 250 output tokens per generation.
|
| 78 |
|
| 79 |
+
<img src="./assets/cost.png" alt="Cost Comparison Per 1 Million Generations" width="100%" />
|
|
|
|
|
|
|
| 80 |
|
| 81 |
ClipTagger-12b offers **15x cost savings** compared to GPT-4.1 and **17x cost savings** compared to Claude 4 Sonnet, while maintaining comparable quality metrics.
|
| 82 |
|