data-archetype commited on
Commit
e1af58e
·
verified ·
1 Parent(s): 1678188

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +7 -10
README.md CHANGED
@@ -22,14 +22,11 @@ architecture.
22
 
23
  | Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
24
  |---:|---:|---:|---:|---:|---:|---:|
25
- | `512x512` | `6.15x` | `61.5%` | `3.89` | `23.94` | `356.2 MiB` | `925.5 MiB` |
26
- | `1024x1024` | `11.98x` | `80.8%` | `9.86` | `118.19` | `540.2 MiB` | `2815.2 MiB` |
27
- | `2048x2048` | `10.81x` | `87.7%` | `52.12` | `563.28` | `1277.8 MiB` | `10371.8 MiB` |
28
 
29
- These measurements are decode-only and were run on an `NVIDIA GeForce RTX 5090`.
30
- Each image is first encoded once with the same FLUX.2 encoder, latents are
31
- cached in memory, and then both decoders are timed over the same cached latent
32
- set.
33
 
34
  ## 2k PSNR Benchmark
35
 
@@ -90,9 +87,8 @@ with torch.inference_mode():
90
  posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16))
91
  latent_mean = posterior.latent_dist.mean
92
 
93
- # Default path: match the usual FLUX.2 convention.
94
- # Whiten here, then let capacitor_decoder unwhiten internally before decode.
95
- latents = flux2_patchify_and_whiten(latent_mean, flux2)
96
  recon = decoder.decode(
97
  latents,
98
  height=int(image.shape[-2]),
@@ -127,3 +123,4 @@ upstream and call `decode(..., latents_are_flux2_whitened=False)`.
127
  url = {https://huggingface.co/data-archetype/capacitor_decoder},
128
  }
129
  ```
 
 
22
 
23
  | Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
24
  |---:|---:|---:|---:|---:|---:|---:|
25
+ | `512x512` | `3.41x` | `61.8%` | `7.34` | `25.03` | `351.2 MiB` | `920.5 MiB` |
26
+ | `1024x1024` | `10.80x` | `81.4%` | `11.60` | `125.35` | `520.2 MiB` | `2795.2 MiB` |
27
+ | `2048x2048` | `10.95x` | `88.4%` | `55.81` | `611.34` | `1197.8 MiB` | `10291.8 MiB` |
28
 
29
+ These measurements are decode-only, were run on an `NVIDIA GeForce RTX 5090` in `bfloat16`, and time sequential batch-1 decode over the same cached latent set for both decoders.
 
 
 
30
 
31
  ## 2k PSNR Benchmark
32
 
 
87
  posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16))
88
  latent_mean = posterior.latent_dist.mean
89
 
90
+ # Default path: whiten in float32, then cast back to model dtype before decode.
91
+ latents = flux2_patchify_and_whiten(latent_mean, flux2).to(dtype=torch.bfloat16)
 
92
  recon = decoder.decode(
93
  latents,
94
  height=int(image.shape[-2]),
 
123
  url = {https://huggingface.co/data-archetype/capacitor_decoder},
124
  }
125
  ```
126
+