Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,20 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
-
|
4 |
-
This document presents the evaluation results of `
|
5 |
|
6 |
---
|
7 |
|
@@ -23,8 +36,9 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
|
|
23 |
|
24 |
## ⚙️ Model Configuration
|
25 |
|
26 |
-
- **Model:** `
|
27 |
-
- **Parameters:** `
|
|
|
28 |
- **Source:** Hugging Face (`hf`)
|
29 |
- **Precision:** `torch.float16`
|
30 |
- **Hardware:** `NVIDIA A100 80GB PCIe`
|
@@ -35,7 +49,7 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
|
|
35 |
|
36 |
📌 **Interpretation:**
|
37 |
- The evaluation was performed on a **high-performance GPU (A100 80GB)**.
|
38 |
-
- The model is **4-bit
|
39 |
- A **single-sample batch size** was used, which might slow evaluation speed.
|
40 |
|
41 |
---
|
@@ -57,9 +71,9 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
|
|
57 |
|
58 |
- The `"higher_is_better"` flag confirms that **higher accuracy is preferred**.
|
59 |
- The model's **raw accuracy (21.2%)** is significantly lower compared to state-of-the-art models (**60–80%** on ARC-Challenge).
|
60 |
-
- **Quantization Impact:** The **4-bit
|
61 |
- **Zero-shot Limitation:** Performance could improve with **few-shot prompting** (providing examples before testing).
|
62 |
|
63 |
---
|
64 |
|
65 |
-
📌 Let us know if you need further analysis or model tuning! 🚀
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-4.0
|
3 |
+
datasets:
|
4 |
+
- allenai/c4
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
metrics:
|
8 |
+
- accuracy
|
9 |
+
base_model:
|
10 |
+
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
|
11 |
+
pipeline_tag: text-generation
|
12 |
+
---
|
13 |
+
|
14 |
+
|
15 |
|
16 |
+
# Overview
|
17 |
+
This document presents the evaluation results of `DeepSeek-R1-Distill-Llama-70B`, a **4-bit quantized model using GPTQ**, evaluated with the **Language Model Evaluation Harness** on the **ARC-Challenge** benchmark.
|
18 |
|
19 |
---
|
20 |
|
|
|
36 |
|
37 |
## ⚙️ Model Configuration
|
38 |
|
39 |
+
- **Model:** `DeepSeek-R1-Distill-Llama-70B`
|
40 |
+
- **Parameters:** `70 billion`
|
41 |
+
- **Quantization:** `4-bit GPTQ`
|
42 |
- **Source:** Hugging Face (`hf`)
|
43 |
- **Precision:** `torch.float16`
|
44 |
- **Hardware:** `NVIDIA A100 80GB PCIe`
|
|
|
49 |
|
50 |
📌 **Interpretation:**
|
51 |
- The evaluation was performed on a **high-performance GPU (A100 80GB)**.
|
52 |
+
- The model is significantly larger than the previous 8B version, with **GPTQ 4-bit quantization reducing memory footprint**.
|
53 |
- A **single-sample batch size** was used, which might slow evaluation speed.
|
54 |
|
55 |
---
|
|
|
71 |
|
72 |
- The `"higher_is_better"` flag confirms that **higher accuracy is preferred**.
|
73 |
- The model's **raw accuracy (21.2%)** is significantly lower compared to state-of-the-art models (**60–80%** on ARC-Challenge).
|
74 |
+
- **Quantization Impact:** The **4-bit GPTQ quantization** reduces memory usage but may also impact accuracy slightly.
|
75 |
- **Zero-shot Limitation:** Performance could improve with **few-shot prompting** (providing examples before testing).
|
76 |
|
77 |
---
|
78 |
|
79 |
+
📌 Let us know if you need further analysis or model tuning! 🚀
|