rwmasood commited on
Commit
98a1b61
·
verified ·
1 Parent(s): f455523

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -8
README.md CHANGED
@@ -1,7 +1,20 @@
1
- # Language Model Evaluation Results
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- ## Overview
4
- This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4bit` using the **Language Model Evaluation Harness** on the **ARC-Challenge** benchmark.
5
 
6
  ---
7
 
@@ -23,8 +36,9 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
23
 
24
  ## ⚙️ Model Configuration
25
 
26
- - **Model:** `Llama-3.1-8B-Instruct-gptq-4bit`
27
- - **Parameters:** `1.05 billion` (Quantized 4-bit model)
 
28
  - **Source:** Hugging Face (`hf`)
29
  - **Precision:** `torch.float16`
30
  - **Hardware:** `NVIDIA A100 80GB PCIe`
@@ -35,7 +49,7 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
35
 
36
  📌 **Interpretation:**
37
  - The evaluation was performed on a **high-performance GPU (A100 80GB)**.
38
- - The model is **4-bit quantized**, reducing memory usage but possibly affecting accuracy.
39
  - A **single-sample batch size** was used, which might slow evaluation speed.
40
 
41
  ---
@@ -57,9 +71,9 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
57
 
58
  - The `"higher_is_better"` flag confirms that **higher accuracy is preferred**.
59
  - The model's **raw accuracy (21.2%)** is significantly lower compared to state-of-the-art models (**60–80%** on ARC-Challenge).
60
- - **Quantization Impact:** The **4-bit quantized model** might perform worse than a full-precision version.
61
  - **Zero-shot Limitation:** Performance could improve with **few-shot prompting** (providing examples before testing).
62
 
63
  ---
64
 
65
- 📌 Let us know if you need further analysis or model tuning! 🚀
 
1
+ ---
2
+ license: cc-by-4.0
3
+ datasets:
4
+ - allenai/c4
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ base_model:
10
+ - deepseek-ai/DeepSeek-R1-Distill-Llama-70B
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+
15
 
16
+ # Overview
17
+ This document presents the evaluation results of `DeepSeek-R1-Distill-Llama-70B`, a **4-bit quantized model using GPTQ**, evaluated with the **Language Model Evaluation Harness** on the **ARC-Challenge** benchmark.
18
 
19
  ---
20
 
 
36
 
37
  ## ⚙️ Model Configuration
38
 
39
+ - **Model:** `DeepSeek-R1-Distill-Llama-70B`
40
+ - **Parameters:** `70 billion`
41
+ - **Quantization:** `4-bit GPTQ`
42
  - **Source:** Hugging Face (`hf`)
43
  - **Precision:** `torch.float16`
44
  - **Hardware:** `NVIDIA A100 80GB PCIe`
 
49
 
50
  📌 **Interpretation:**
51
  - The evaluation was performed on a **high-performance GPU (A100 80GB)**.
52
+ - The model is significantly larger than the previous 8B version, with **GPTQ 4-bit quantization reducing memory footprint**.
53
  - A **single-sample batch size** was used, which might slow evaluation speed.
54
 
55
  ---
 
71
 
72
  - The `"higher_is_better"` flag confirms that **higher accuracy is preferred**.
73
  - The model's **raw accuracy (21.2%)** is significantly lower compared to state-of-the-art models (**60–80%** on ARC-Challenge).
74
+ - **Quantization Impact:** The **4-bit GPTQ quantization** reduces memory usage but may also impact accuracy slightly.
75
  - **Zero-shot Limitation:** Performance could improve with **few-shot prompting** (providing examples before testing).
76
 
77
  ---
78
 
79
+ 📌 Let us know if you need further analysis or model tuning! 🚀