File size: 2,498 Bytes
1ae5648
 
abad80d
 
 
 
 
 
 
73a3b77
1ae5648
 
abad80d
1ae5648
9f80e4d
1ae5648
abad80d
1ae5648
abad80d
1ae5648
abad80d
9f80e4d
abad80d
 
 
1ae5648
024a439
1ae5648
abad80d
 
024a439
1ae5648
abad80d
 
1ae5648
67e4c04
024a439
 
67e4c04
1ae5648
b6db0db
024a439
 
67e4c04
1ae5648
abad80d
024a439
 
 
67e4c04
1ae5648
abad80d
 
9f80e4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
library_name: diffusers
license: apache-2.0
datasets:
- laion/relaion400m
base_model:
- black-forest-labs/FLUX.2-dev
tags:
- tae
- taef2
---

# About

Tiny AutoEncoder trained on the latent space of [black-forest-labs/FLUX.2-dev](https://huggingface.co/black-forest-labs/FLUX.2-dev)'s autoencoder. Works to convert between latent and image space up to 20x faster and in 28x fewer parameters at the expense of a small amount of quality.

Code for this model is available [here](https://huggingface.co/fal/FLUX.2-Tiny-AutoEncoder/blob/main/flux2_tiny_autoencoder.py).

# Round-Trip Comparisons

| Source | Image |
| ------ | ----- |
| https://www.pexels.com/photo/mirror-lying-on-open-book-11495792/ | ![compare_autoencoders_1](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/u7ZnjY8FAwu09-iyEC_um.png) |
| https://www.pexels.com/photo/brown-hummingbird-selective-focus-photography-1133957/ | ![compare_autoencoders_2](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/ZzvJu3VfrzlvZ7bDDASog.png) |
| https://www.pexels.com/photo/person-with-body-painting-1209843/ | ![compare_autoencoders_3](https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/B56LPhLYiGT0ffnBVIRbP.png) |

# Example Usage

```py
import torch
import torchvision.transforms.functional as F

from PIL import Image
from flux2_tiny_autoencoder import Flux2TinyAutoEncoder

device = torch.device("cuda")
tiny_vae = Flux2TinyAutoEncoder.from_pretrained(
    "fal/FLUX.2-Tiny-AutoEncoder",
).to(device=device, dtype=torch.bfloat16)

pil_image = Image.open("/path/to/image.png")
image_tensor = F.to_tensor(pil_image)
image_tensor = image_tensor.unsqueeze(0) * 2.0 - 1.0
image_tensor = image_tensor.to(device, dtype=tiny_vae.dtype)

with torch.inference_mode():
    latents = tiny_vae.encode(image_tensor, return_dict=False)
    recon = tiny_vae.decode(latents, return_dict=False)
    recon = recon.squeeze(0).clamp(-1, 1) / 2.0 + 0.5
    recon = recon.float().detach().cpu()

recon_image = F.to_pil_image(recon)
recon_image.save("reconstituted.png")
```

## Use with Diffusers 🧨

```py
import torch
from diffusers import AutoModel, Flux2Pipeline

device = torch.device("cuda")
tiny_vae = AutoModel.from_pretrained(
    "fal/FLUX.2-Tiny-AutoEncoder", trust_remote_code=True, torch_dtype=torch.bfloat16
).to(device)

pipe = Flux2Pipeline.from_pretrained(
    "black-forest-labs/FLUX.2-dev", vae=tiny_vae, torch_dtype=torch.bfloat16
).to(device)
```