Update README.md
Browse files
README.md
CHANGED
@@ -137,13 +137,31 @@ The `Original` column in latency benchmarks typically refers to the Hugging Face
|
|
137 |
|
138 |
### Latency benchmarks (Tokens Per Second - TPS)
|
139 |
|
140 |
-
Performance for generating audio (decoder stage).
|
141 |
|
142 |
| GPU Type | S | M | L | XL (Compiled Original) | Original (HF, non-compiled) |
|
143 |
|----------|--------|--------|--------|------------------------|-----------------------------|
|
144 |
| H100 | 122.75 | 124.70 | 126.21 | 126.71 | 45.33 |
|
145 |
| L40S | 96.74 | 90.90 | 86.51 | 83.31 | 44.69 |
|
146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
147 |
|
148 |
## Links
|
149 |
|
|
|
137 |
|
138 |
### Latency benchmarks (Tokens Per Second - TPS)
|
139 |
|
140 |
+
Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
|
141 |
|
142 |
| GPU Type | S | M | L | XL (Compiled Original) | Original (HF, non-compiled) |
|
143 |
|----------|--------|--------|--------|------------------------|-----------------------------|
|
144 |
| H100 | 122.75 | 124.70 | 126.21 | 126.71 | 45.33 |
|
145 |
| L40S | 96.74 | 90.90 | 86.51 | 83.31 | 44.69 |
|
146 |
|
147 |
+
#### Performance by Batch Size
|
148 |
+
|
149 |
+
**Batch Size 16:**
|
150 |
+
| GPU Type | S Mode (TPS) | XL Mode (TPS) |
|
151 |
+
|----------|--------------|---------------|
|
152 |
+
| H100 | 94.21 | 97.96 |
|
153 |
+
| L40S | 69.66 | 63.19 |
|
154 |
+
|
155 |
+
**Batch Size 32:**
|
156 |
+
| GPU Type | S Mode (TPS) | XL Mode (TPS) |
|
157 |
+
|----------|--------------|---------------|
|
158 |
+
| H100 | 77.15 | 76.64 |
|
159 |
+
| L40S | 54.81 | 51.34 |
|
160 |
+
|
161 |
+
> **Note:** Currently deployed models support only batch size = 1. Expect upcoming updates for larger batch size support.
|
162 |
+
|
163 |
+
As shown in the results, smaller batch sizes typically demonstrate higher per-token performance, which is typical for inference tasks.
|
164 |
+
|
165 |
|
166 |
## Links
|
167 |
|