empirischtech
/

DeepSeek-R1-Distill-Llama-70B-gptq-4bit

Text Generation

4-bit precision

Model card Files Files and versions Community

rwmasood commited on Feb 5

Commit

98a1b61

·

verified ·

1 Parent(s): f455523

Update README.md

Files changed (1) hide show

README.md +22 -8

README.md CHANGED Viewed

@@ -1,7 +1,20 @@
-# Language Model Evaluation Results
-## Overview
-This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4bit` using the **Language Model Evaluation Harness** on the **ARC-Challenge** benchmark.
 ---
@@ -23,8 +36,9 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
 ## ⚙️ Model Configuration
-- **Model:** `Llama-3.1-8B-Instruct-gptq-4bit`
-- **Parameters:** `1.05 billion` (Quantized 4-bit model)
 - **Source:** Hugging Face (`hf`)
 - **Precision:** `torch.float16`
 - **Hardware:** `NVIDIA A100 80GB PCIe`
@@ -35,7 +49,7 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
 📌 **Interpretation:**
 - The evaluation was performed on a **high-performance GPU (A100 80GB)**.
-- The model is **4-bit quantized**, reducing memory usage but possibly affecting accuracy.
 - A **single-sample batch size** was used, which might slow evaluation speed.
 ---
@@ -57,9 +71,9 @@ This document presents the evaluation results of `Llama-3.3-70B-Instruct-gptq-4b
 - The `"higher_is_better"` flag confirms that **higher accuracy is preferred**.
 - The model's **raw accuracy (21.2%)** is significantly lower compared to state-of-the-art models (**60–80%** on ARC-Challenge).
-- **Quantization Impact:** The **4-bit quantized model** might perform worse than a full-precision version.
 - **Zero-shot Limitation:** Performance could improve with **few-shot prompting** (providing examples before testing).
 ---
-📌 Let us know if you need further analysis or model tuning! 🚀

+---
+license: cc-by-4.0
+datasets:
+- allenai/c4
+language:
+- en
+metrics:
+- accuracy
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
+pipeline_tag: text-generation
+---
+# Overview
+This document presents the evaluation results of `DeepSeek-R1-Distill-Llama-70B`, a **4-bit quantized model using GPTQ**, evaluated with the **Language Model Evaluation Harness** on the **ARC-Challenge** benchmark.
 ---
 ## ⚙️ Model Configuration
+- **Model:** `DeepSeek-R1-Distill-Llama-70B`
+- **Parameters:** `70 billion`
+- **Quantization:** `4-bit GPTQ`
 - **Source:** Hugging Face (`hf`)
 - **Precision:** `torch.float16`
 - **Hardware:** `NVIDIA A100 80GB PCIe`
 📌 **Interpretation:**
 - The evaluation was performed on a **high-performance GPU (A100 80GB)**.
+- The model is significantly larger than the previous 8B version, with **GPTQ 4-bit quantization reducing memory footprint**.
 - A **single-sample batch size** was used, which might slow evaluation speed.
 ---
 - The `"higher_is_better"` flag confirms that **higher accuracy is preferred**.
 - The model's **raw accuracy (21.2%)** is significantly lower compared to state-of-the-art models (**60–80%** on ARC-Challenge).
+- **Quantization Impact:** The **4-bit GPTQ quantization** reduces memory usage but may also impact accuracy slightly.
 - **Zero-shot Limitation:** Performance could improve with **few-shot prompting** (providing examples before testing).
 ---
+📌 Let us know if you need further analysis or model tuning! 🚀