data-archetype
/

capacitor_decoder

@@ -30,7 +30,30 @@ The important invariant is that whitening must be handled consistently on both
 sides. If whitening is enabled upstream, keep the decoder default. If whitening
 is disabled upstream, disable dewhitening in the decoder too.
-## 3. Training
 This export corresponds to roughly **300k training steps**. The saved run
 configuration uses:
@@ -56,7 +79,7 @@ configuration uses:
 | Compilation | `torch.compile` enabled |
 | Validation / checkpoint cadence | every `1,000` steps |
-## 4. Links
 - [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae)
 - [This model card](https://huggingface.co/data-archetype/capacitor_decoder)
@@ -75,4 +98,3 @@ configuration uses:
   url     = {https://huggingface.co/data-archetype/capacitor_decoder},
 }
 ```

 sides. If whitening is enabled upstream, keep the decoder default. If whitening
 is disabled upstream, disable dewhitening in the decoder too.
+## 3. Decode Speed
+### 3.1 RTX 5090
+| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
+|---:|---:|---:|---:|---:|---:|---:|
+| `512x512` | `6.15x` | `61.5%` | `3.89` | `23.94` | `356.2 MiB` | `925.5 MiB` |
+| `1024x1024` | `11.98x` | `80.8%` | `9.86` | `118.19` | `540.2 MiB` | `2815.2 MiB` |
+| `2048x2048` | `10.81x` | `87.7%` | `52.12` | `563.28` | `1277.8 MiB` | `10371.8 MiB` |
+These measurements are decode-only, use sequential batch-1 decode over cached
+latents, and were run on an `NVIDIA GeForce RTX 5090`.
+### 3.2 GH200
+| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
+|---:|---:|---:|---:|---:|---:|---:|
+| `512x512` | `1.85x` | `59.3%` | `11.40` | `21.14` | `391.6 MiB` | `961.9 MiB` |
+| `1024x1024` | `3.28x` | `79.1%` | `26.31` | `86.24` | `601.4 MiB` | `2876.4 MiB` |
+| `2048x2048` | `4.70x` | `86.4%` | `86.29` | `405.84` | `1437.4 MiB` | `10531.4 MiB` |
+These measurements are decode-only and were run on an `NVIDIA GH200`.
+## 4. Training
 This export corresponds to roughly **300k training steps**. The saved run
 configuration uses:
 | Compilation | `torch.compile` enabled |
 | Validation / checkpoint cadence | every `1,000` steps |
+## 5. Links
 - [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae)
 - [This model card](https://huggingface.co/data-archetype/capacitor_decoder)
   url     = {https://huggingface.co/data-archetype/capacitor_decoder},
 }
 ```