Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -22,14 +22,11 @@ architecture.
|
|
| 22 |
|
| 23 |
| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
|
| 24 |
|---:|---:|---:|---:|---:|---:|---:|
|
| 25 |
-
| `512x512` | `
|
| 26 |
-
| `1024x1024` | `
|
| 27 |
-
| `2048x2048` | `10.
|
| 28 |
|
| 29 |
-
These measurements are decode-only
|
| 30 |
-
Each image is first encoded once with the same FLUX.2 encoder, latents are
|
| 31 |
-
cached in memory, and then both decoders are timed over the same cached latent
|
| 32 |
-
set.
|
| 33 |
|
| 34 |
## 2k PSNR Benchmark
|
| 35 |
|
|
@@ -90,9 +87,8 @@ with torch.inference_mode():
|
|
| 90 |
posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16))
|
| 91 |
latent_mean = posterior.latent_dist.mean
|
| 92 |
|
| 93 |
-
# Default path:
|
| 94 |
-
|
| 95 |
-
latents = flux2_patchify_and_whiten(latent_mean, flux2)
|
| 96 |
recon = decoder.decode(
|
| 97 |
latents,
|
| 98 |
height=int(image.shape[-2]),
|
|
@@ -127,3 +123,4 @@ upstream and call `decode(..., latents_are_flux2_whitened=False)`.
|
|
| 127 |
url = {https://huggingface.co/data-archetype/capacitor_decoder},
|
| 128 |
}
|
| 129 |
```
|
|
|
|
|
|
| 22 |
|
| 23 |
| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
|
| 24 |
|---:|---:|---:|---:|---:|---:|---:|
|
| 25 |
+
| `512x512` | `3.41x` | `61.8%` | `7.34` | `25.03` | `351.2 MiB` | `920.5 MiB` |
|
| 26 |
+
| `1024x1024` | `10.80x` | `81.4%` | `11.60` | `125.35` | `520.2 MiB` | `2795.2 MiB` |
|
| 27 |
+
| `2048x2048` | `10.95x` | `88.4%` | `55.81` | `611.34` | `1197.8 MiB` | `10291.8 MiB` |
|
| 28 |
|
| 29 |
+
These measurements are decode-only, were run on an `NVIDIA GeForce RTX 5090` in `bfloat16`, and time sequential batch-1 decode over the same cached latent set for both decoders.
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## 2k PSNR Benchmark
|
| 32 |
|
|
|
|
| 87 |
posterior = flux2.encode(image.to(device=device, dtype=torch.bfloat16))
|
| 88 |
latent_mean = posterior.latent_dist.mean
|
| 89 |
|
| 90 |
+
# Default path: whiten in float32, then cast back to model dtype before decode.
|
| 91 |
+
latents = flux2_patchify_and_whiten(latent_mean, flux2).to(dtype=torch.bfloat16)
|
|
|
|
| 92 |
recon = decoder.decode(
|
| 93 |
latents,
|
| 94 |
height=int(image.shape[-2]),
|
|
|
|
| 123 |
url = {https://huggingface.co/data-archetype/capacitor_decoder},
|
| 124 |
}
|
| 125 |
```
|
| 126 |
+
|