qaihm-bot commited on
Commit
f74216d
·
verified ·
1 Parent(s): 8877bcd

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -1,10 +1,9 @@
1
  ---
2
  library_name: pytorch
3
- license: apache-2.0
4
  tags:
5
  - llm
6
  - generative_ai
7
- - quantized
8
  - android
9
  pipeline_tag: text-generation
10
 
@@ -25,7 +24,7 @@ This model is an implementation of Baichuan2-7B found [here](https://github.com/
25
 
26
  ### Model Details
27
 
28
- - **Model Type:** Text generation
29
  - **Model Stats:**
30
  - Input sequence length for Prompt Processor: 128
31
  - Context length: 4096
@@ -49,9 +48,9 @@ This model is an implementation of Baichuan2-7B found [here](https://github.com/
49
  - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
50
  - Response Rate: Rate of response generation after the first response token.
51
 
52
- | Model | Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds)
53
  |---|---|---|---|---|---|
54
- | Baichuan2-7B | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite | QNN | 7.72 | 0.20804799999999998 - 6.6575359999999995 | -- | Use Export Script |
55
 
56
  ## Deploying Baichuan2-7B on-device
57
 
 
1
  ---
2
  library_name: pytorch
3
+ license: other
4
  tags:
5
  - llm
6
  - generative_ai
 
7
  - android
8
  pipeline_tag: text-generation
9
 
 
24
 
25
  ### Model Details
26
 
27
+ - **Model Type:** Model_use_case.text_generation
28
  - **Model Stats:**
29
  - Input sequence length for Prompt Processor: 128
30
  - Context length: 4096
 
48
  - TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens).
49
  - Response Rate: Rate of response generation after the first response token.
50
 
51
+ | Model | Precision | Device | Chipset | Target Runtime | Response Rate (tokens per second) | Time To First Token (range, seconds)
52
  |---|---|---|---|---|---|
53
+ | Baichuan2-7B | w4a16 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite Mobile | QNN | 7.72 | 0.208048 - 6.657536 | -- | Use Export Script |
54
 
55
  ## Deploying Baichuan2-7B on-device
56