TheStageAI
/

Elastic-musicgen-large

music-generation

Model card Files Files and versions Community

psynote123 commited on 4 days ago

Commit

79ee2b7

·

verified ·

1 Parent(s): bd544e9

Update README.md

Files changed (1) hide show

README.md +19 -1

README.md CHANGED Viewed

@@ -137,13 +137,31 @@ The `Original` column in latency benchmarks typically refers to the Hugging Face
 ### Latency benchmarks (Tokens Per Second - TPS)
-Performance for generating audio (decoder stage).
 | GPU Type | S      | M      | L      | XL (Compiled Original) | Original (HF, non-compiled) |
 |----------|--------|--------|--------|------------------------|-----------------------------|
 | H100     | 122.75 | 124.70 | 126.21 | 126.71                 | 45.33       |
 | L40S     | 96.74  | 90.90  | 86.51  | 83.31                  | 44.69      |
 ## Links

 ### Latency benchmarks (Tokens Per Second - TPS)
+Performance for generating audio (decoder stage, max_new_tokens = 256 (5 seconds audio)).
 | GPU Type | S      | M      | L      | XL (Compiled Original) | Original (HF, non-compiled) |
 |----------|--------|--------|--------|------------------------|-----------------------------|
 | H100     | 122.75 | 124.70 | 126.21 | 126.71                 | 45.33       |
 | L40S     | 96.74  | 90.90  | 86.51  | 83.31                  | 44.69      |
+#### Performance by Batch Size
+**Batch Size 16:**
+| GPU Type | S Mode (TPS) | XL Mode (TPS) |
+|----------|--------------|---------------|
+| H100     | 94.21        | 97.96         |
+| L40S     | 69.66        | 63.19         |
+**Batch Size 32:**
+| GPU Type | S Mode (TPS) | XL Mode (TPS) |
+|----------|--------------|---------------|
+| H100     | 77.15        | 76.64         |
+| L40S     | 54.81        | 51.34         |
+> **Note:** Currently deployed models support only batch size = 1. Expect upcoming updates for larger batch size support.
+As shown in the results, smaller batch sizes typically demonstrate higher per-token performance, which is typical for inference tasks.
 ## Links