psynote123 commited on
Commit
79ee2b7
·
verified ·
1 Parent(s): bd544e9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -137,13 +137,31 @@ The `Original` column in latency benchmarks typically refers to the Hugging Face
137
 
138
  ### Latency benchmarks (Tokens Per Second - TPS)
139
 
140
- Performance for generating audio (decoder stage).
141
 
142
  | GPU Type | S | M | L | XL (Compiled Original) | Original (HF, non-compiled) |
143
  |----------|--------|--------|--------|------------------------|-----------------------------|
144
  | H100 | 122.75 | 124.70 | 126.21 | 126.71 | 45.33 |
145
  | L40S | 96.74 | 90.90 | 86.51 | 83.31 | 44.69 |
146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
  ## Links
149
 
 
137
 
138
  ### Latency benchmarks (Tokens Per Second - TPS)
139
 
140
+ Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
141
 
142
  | GPU Type | S | M | L | XL (Compiled Original) | Original (HF, non-compiled) |
143
  |----------|--------|--------|--------|------------------------|-----------------------------|
144
  | H100 | 122.75 | 124.70 | 126.21 | 126.71 | 45.33 |
145
  | L40S | 96.74 | 90.90 | 86.51 | 83.31 | 44.69 |
146
 
147
+ #### Performance by Batch Size
148
+
149
+ **Batch Size 16:**
150
+ | GPU Type | S Mode (TPS) | XL Mode (TPS) |
151
+ |----------|--------------|---------------|
152
+ | H100 | 94.21 | 97.96 |
153
+ | L40S | 69.66 | 63.19 |
154
+
155
+ **Batch Size 32:**
156
+ | GPU Type | S Mode (TPS) | XL Mode (TPS) |
157
+ |----------|--------------|---------------|
158
+ | H100 | 77.15 | 76.64 |
159
+ | L40S | 54.81 | 51.34 |
160
+
161
+ > **Note:** Currently deployed models support only batch size = 1. Expect upcoming updates for larger batch size support.
162
+
163
+ As shown in the results, smaller batch sizes typically demonstrate higher per-token performance, which is typical for inference tasks.
164
+
165
 
166
  ## Links
167