Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -48,9 +48,9 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/b
|
|
48 |
- Response Rate: Rate of response generation after the first response token.
|
49 |
- Tiny MMLU: Tiny MMLU (Massive Multitask Language Understanding) is an English language benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans.
|
50 |
|
51 |
-
| Model | Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds)
|
52 |
-
|
53 |
-
| Baichuan2-7B | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | QNN | 7.72 | 0.20804799999999998 - 6.6575359999999995 |
|
54 |
|
55 |
## Deploying Baichuan2-7B on-device
|
56 |
|
|
|
48 |
- Response Rate: Rate of response generation after the first response token.
|
49 |
- Tiny MMLU: Tiny MMLU (Massive Multitask Language Understanding) is an English language benchmark designed to measure knowledge acquired during pretraining by evaluating models exclusively in zero-shot and few-shot settings. This makes the benchmark more challenging and more similar to how we evaluate humans.
|
50 |
|
51 |
+
| Model | Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds)
|
52 |
+
|---|---|---|---|---|---|
|
53 |
+
| Baichuan2-7B | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | QNN | 7.72 | 0.20804799999999998 - 6.6575359999999995 | -- | Use Export Script |
|
54 |
|
55 |
## Deploying Baichuan2-7B on-device
|
56 |
|