FLUX.2-klein-4b FP8 — Diffusers Transformer

Diffusers-compatible transformer-only weights for FLUX.2-klein-4B, converted from Black Forest Labs' FP8 checkpoint (ComfyUI format).

This repo does not contain the full pipeline. Text encoders, VAE, and scheduler are loaded from black-forest-labs/FLUX.2-klein-4B.

Available variants

Subfolder	Precision	Format	Size	Use case
`transformer_bf16/`	bfloat16	safetensors	~7.7 GB	LoRA training, evaluation baselines, re-quantization
`transformer_fp8_static/`	float8_e4m3fn	torchao `.pt`	~3.9 GB	Production inference (~2x memory saving)

bf16

Lossless dequantization of BFL's FP8 weights (bf16 can represent all float8_e4m3fn values exactly). This is the recommended starting point for fine-tuning or LoRA training — the weights are numerically identical to BFL's original FP8 model.

FP8 static (torchao)

Both weights and activations are quantized to float8_e4m3fn. Activation scales are the original per-layer input_scale values from BFL's calibration. The checkpoint is a torch.save dict containing:

state_dict — torchao AffineQuantizedTensor weights
act_scales — per-Linear static activation scales (float32)
fp8_dtype — "float8_e4m3fn"

Usage — bf16

from diffusers import Flux2Transformer2DModel, Flux2KleinPipeline
from PIL import Image
import torch

# Load transformer (bf16)
transformer = Flux2Transformer2DModel.from_pretrained(
    "photoroom/FLUX.2-klein-4b-fp8-diffusers",
    subfolder="transformer_bf16",
    torch_dtype=torch.bfloat16,
).to("cuda")

# Load pipeline (text encoders, VAE, scheduler from BFL)
pipe = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-4B",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

# Run inference
image = Image.open("input.png").convert("RGB")
result = pipe(
    prompt="a product on a marble countertop",
    image=[image],
    height=1024,
    width=1024,
    guidance_scale=1.0,
    num_inference_steps=4,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
result.save("output.png")

Usage — FP8 static

from diffusers import Flux2Transformer2DModel, Flux2KleinPipeline
from huggingface_hub import hf_hub_download
from load_torchao import load_torchao_fp8_static_model
from PIL import Image
import torch

# Load FP8 static transformer
ckpt_path = hf_hub_download(
    "photoroom/FLUX.2-klein-4b-fp8-diffusers",
    filename="transformer_fp8_static/model_fp8_static.pt",
)

transformer = load_torchao_fp8_static_model(
    ckpt_path=ckpt_path,
    base_model_or_factory=lambda: Flux2Transformer2DModel.from_pretrained(
        "photoroom/FLUX.2-klein-4b-fp8-diffusers",
        subfolder="transformer_bf16",
        torch_dtype=torch.bfloat16,
    ),
    device="cuda",
)

# Load pipeline (text encoders, VAE, scheduler from BFL)
pipe = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-4B",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

# Run inference
# image = Image.open("input.png").convert("RGB")
result = pipe(
    prompt="a cat holding a frame with FP8 writing on it",
    image=[None],
    height=1024,
    width=1024,
    guidance_scale=1.0,
    num_inference_steps=4,
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
result.save("output.png")

Quality comparison — Original bf16 vs Dequantized bf16 vs FP8 static

Side-by-side text-to-image comparison at 1024x1024, 4 steps, guidance_scale=1.0. Prompts are chosen to stress fine details, textures, gradients, and high-frequency patterns.

Each column shows: Original BFL bf16 | Dequantized bf16 (this repo) | FP8 static (this repo).

Prompts used

Fine text + wood grain — "A close-up photograph of a vintage wooden sign that reads 'OPEN DAILY 9AM-6PM' in hand-painted white serif letters on a dark green background, peeling paint revealing wood grain underneath, tiny rusty nail heads, cobwebs in the corner, shot with a macro lens"
High-frequency fabric — "Flat lay photograph of a neatly folded black and white houndstooth wool blazer next to a herringbone tweed scarf on a clean white marble surface with fine grey veining, visible individual wool fibers, top-down view, 8K product photography"
Gradients + caustics — "A single chrome sphere resting on a wet black surface reflecting a sunset sky gradient from deep orange to violet, tiny water droplets scattered around it catching light as caustic sparkles, distant city skyline reflected in the sphere, photorealistic 8K"
Grass + nature macro — "Extreme close-up of a freshly mowed lawn with individual grass blades in sharp focus, morning dew droplets on each blade refracting light into tiny rainbows, a small ladybug crawling on one blade, scattered clover leaves with visible vein patterns, macro photography, f/2.8 bokeh in the background"
Architecture detail — "Aerial photograph of a Baroque cathedral rooftop showing hundreds of individual terracotta roof tiles, ornate stone gargoyles with weathered faces, tiny stained glass windows with visible lead cames, moss growing between cracks, pigeons perched on ledges, ultra detailed 8K drone photography"

License

This model is a derivative of FLUX.2-klein-4B and is subject to the FLUX.2-klein-4B license.

Downloads last month: -

Model tree for Photoroom/FLUX.2-klein-4b-fp8-diffusers

Base model

black-forest-labs/FLUX.2-klein-4B

Finetuned

(8)

this model