data-archetype commited on
Commit
1678188
·
verified ·
1 Parent(s): 937821f

Update decoder benchmark docs for RTX 5090 and GH200

Browse files
technical_report_capacitor_decoder.md CHANGED
@@ -30,7 +30,30 @@ The important invariant is that whitening must be handled consistently on both
30
  sides. If whitening is enabled upstream, keep the decoder default. If whitening
31
  is disabled upstream, disable dewhitening in the decoder too.
32
 
33
- ## 3. Training
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  This export corresponds to roughly **300k training steps**. The saved run
36
  configuration uses:
@@ -56,7 +79,7 @@ configuration uses:
56
  | Compilation | `torch.compile` enabled |
57
  | Validation / checkpoint cadence | every `1,000` steps |
58
 
59
- ## 4. Links
60
 
61
  - [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae)
62
  - [This model card](https://huggingface.co/data-archetype/capacitor_decoder)
@@ -75,4 +98,3 @@ configuration uses:
75
  url = {https://huggingface.co/data-archetype/capacitor_decoder},
76
  }
77
  ```
78
-
 
30
  sides. If whitening is enabled upstream, keep the decoder default. If whitening
31
  is disabled upstream, disable dewhitening in the decoder too.
32
 
33
+ ## 3. Decode Speed
34
+
35
+ ### 3.1 RTX 5090
36
+
37
+ | Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
38
+ |---:|---:|---:|---:|---:|---:|---:|
39
+ | `512x512` | `6.15x` | `61.5%` | `3.89` | `23.94` | `356.2 MiB` | `925.5 MiB` |
40
+ | `1024x1024` | `11.98x` | `80.8%` | `9.86` | `118.19` | `540.2 MiB` | `2815.2 MiB` |
41
+ | `2048x2048` | `10.81x` | `87.7%` | `52.12` | `563.28` | `1277.8 MiB` | `10371.8 MiB` |
42
+
43
+ These measurements are decode-only, use sequential batch-1 decode over cached
44
+ latents, and were run on an `NVIDIA GeForce RTX 5090`.
45
+
46
+ ### 3.2 GH200
47
+
48
+ | Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
49
+ |---:|---:|---:|---:|---:|---:|---:|
50
+ | `512x512` | `1.85x` | `59.3%` | `11.40` | `21.14` | `391.6 MiB` | `961.9 MiB` |
51
+ | `1024x1024` | `3.28x` | `79.1%` | `26.31` | `86.24` | `601.4 MiB` | `2876.4 MiB` |
52
+ | `2048x2048` | `4.70x` | `86.4%` | `86.29` | `405.84` | `1437.4 MiB` | `10531.4 MiB` |
53
+
54
+ These measurements are decode-only and were run on an `NVIDIA GH200`.
55
+
56
+ ## 4. Training
57
 
58
  This export corresponds to roughly **300k training steps**. The saved run
59
  configuration uses:
 
79
  | Compilation | `torch.compile` enabled |
80
  | Validation / checkpoint cadence | every `1,000` steps |
81
 
82
+ ## 5. Links
83
 
84
  - [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae)
85
  - [This model card](https://huggingface.co/data-archetype/capacitor_decoder)
 
98
  url = {https://huggingface.co/data-archetype/capacitor_decoder},
99
  }
100
  ```