Update decoder benchmark docs for RTX 5090 and GH200
Browse files
technical_report_capacitor_decoder.md
CHANGED
|
@@ -30,7 +30,30 @@ The important invariant is that whitening must be handled consistently on both
|
|
| 30 |
sides. If whitening is enabled upstream, keep the decoder default. If whitening
|
| 31 |
is disabled upstream, disable dewhitening in the decoder too.
|
| 32 |
|
| 33 |
-
## 3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
This export corresponds to roughly **300k training steps**. The saved run
|
| 36 |
configuration uses:
|
|
@@ -56,7 +79,7 @@ configuration uses:
|
|
| 56 |
| Compilation | `torch.compile` enabled |
|
| 57 |
| Validation / checkpoint cadence | every `1,000` steps |
|
| 58 |
|
| 59 |
-
##
|
| 60 |
|
| 61 |
- [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae)
|
| 62 |
- [This model card](https://huggingface.co/data-archetype/capacitor_decoder)
|
|
@@ -75,4 +98,3 @@ configuration uses:
|
|
| 75 |
url = {https://huggingface.co/data-archetype/capacitor_decoder},
|
| 76 |
}
|
| 77 |
```
|
| 78 |
-
|
|
|
|
| 30 |
sides. If whitening is enabled upstream, keep the decoder default. If whitening
|
| 31 |
is disabled upstream, disable dewhitening in the decoder too.
|
| 32 |
|
| 33 |
+
## 3. Decode Speed
|
| 34 |
+
|
| 35 |
+
### 3.1 RTX 5090
|
| 36 |
+
|
| 37 |
+
| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
|
| 38 |
+
|---:|---:|---:|---:|---:|---:|---:|
|
| 39 |
+
| `512x512` | `6.15x` | `61.5%` | `3.89` | `23.94` | `356.2 MiB` | `925.5 MiB` |
|
| 40 |
+
| `1024x1024` | `11.98x` | `80.8%` | `9.86` | `118.19` | `540.2 MiB` | `2815.2 MiB` |
|
| 41 |
+
| `2048x2048` | `10.81x` | `87.7%` | `52.12` | `563.28` | `1277.8 MiB` | `10371.8 MiB` |
|
| 42 |
+
|
| 43 |
+
These measurements are decode-only, use sequential batch-1 decode over cached
|
| 44 |
+
latents, and were run on an `NVIDIA GeForce RTX 5090`.
|
| 45 |
+
|
| 46 |
+
### 3.2 GH200
|
| 47 |
+
|
| 48 |
+
| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
|
| 49 |
+
|---:|---:|---:|---:|---:|---:|---:|
|
| 50 |
+
| `512x512` | `1.85x` | `59.3%` | `11.40` | `21.14` | `391.6 MiB` | `961.9 MiB` |
|
| 51 |
+
| `1024x1024` | `3.28x` | `79.1%` | `26.31` | `86.24` | `601.4 MiB` | `2876.4 MiB` |
|
| 52 |
+
| `2048x2048` | `4.70x` | `86.4%` | `86.29` | `405.84` | `1437.4 MiB` | `10531.4 MiB` |
|
| 53 |
+
|
| 54 |
+
These measurements are decode-only and were run on an `NVIDIA GH200`.
|
| 55 |
+
|
| 56 |
+
## 4. Training
|
| 57 |
|
| 58 |
This export corresponds to roughly **300k training steps**. The saved run
|
| 59 |
configuration uses:
|
|
|
|
| 79 |
| Compilation | `torch.compile` enabled |
|
| 80 |
| Validation / checkpoint cadence | every `1,000` steps |
|
| 81 |
|
| 82 |
+
## 5. Links
|
| 83 |
|
| 84 |
- [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae)
|
| 85 |
- [This model card](https://huggingface.co/data-archetype/capacitor_decoder)
|
|
|
|
| 98 |
url = {https://huggingface.co/data-archetype/capacitor_decoder},
|
| 99 |
}
|
| 100 |
```
|
|
|