data-archetype
/

capacitor_decoder

@@ -22,14 +22,11 @@ architecture.
 | Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
 |---:|---:|---:|---:|---:|---:|---:|
-| `512x512` | `6.15x` | `61.5%` | `3.89` | `23.94` | `356.2 MiB` | `925.5 MiB` |
-| `1024x1024` | `11.98x` | `80.8%` | `9.86` | `118.19` | `540.2 MiB` | `2815.2 MiB` |
-| `2048x2048` | `10.81x` | `87.7%` | `52.12` | `563.28` | `1277.8 MiB` | `10371.8 MiB` |
-These measurements are decode-only and were run on an `NVIDIA GeForce RTX 5090`.
-Each image is first encoded once with the same FLUX.2 encoder, latents are
-cached in memory, and then both decoders are timed over the same cached latent
-set.
 ## 2k PSNR Benchmark
@@ -90,9 +87,8 @@ with torch.inference_mode():
     posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16))
     latent_mean = posterior.latent_dist.mean
-    # Default path: match the usual FLUX.2 convention.
-    # Whiten here, then let capacitor_decoder unwhiten internally before decode.
-    latents = flux2_patchify_and_whiten(latent_mean, flux2)
     recon = decoder.decode(
         latents,
         height=int(image.shape[-2]),
@@ -127,3 +123,4 @@ upstream and call `decode(..., latents_are_flux2_whitened=False)`.
   url     = {https://huggingface.co/data-archetype/capacitor_decoder},
 }
 ```

 | Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
 |---:|---:|---:|---:|---:|---:|---:|
+| `512x512` | `3.41x` | `61.8%` | `7.34` | `25.03` | `351.2 MiB` | `920.5 MiB` |
+| `1024x1024` | `10.80x` | `81.4%` | `11.60` | `125.35` | `520.2 MiB` | `2795.2 MiB` |
+| `2048x2048` | `10.95x` | `88.4%` | `55.81` | `611.34` | `1197.8 MiB` | `10291.8 MiB` |
+These measurements are decode-only, were run on an `NVIDIA GeForce RTX 5090` in `bfloat16`, and time sequential batch-1 decode over the same cached latent set for both decoders.
 ## 2k PSNR Benchmark
     posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16))
     latent_mean = posterior.latent_dist.mean
+    # Default path: whiten in float32, then cast back to model dtype before decode.
+    latents = flux2_patchify_and_whiten(latent_mean, flux2).to(dtype=torch.bfloat16)
     recon = decoder.decode(
         latents,
         height=int(image.shape[-2]),
   url     = {https://huggingface.co/data-archetype/capacitor_decoder},
 }
 ```