This is the repository card of kernels-community/flash-mla that has been pushed on the Hub. It was built to be used with the kernels library. This card was automatically generated.

How to use

# make sure `kernels` is installed: `pip install -U kernels`
from kernels import get_kernel

kernel_module = get_kernel("kernels-community/flash-mla", version=1)
__version__ = kernel_module.__version__

__version__(...)

Available functions

  • __version__
  • FlashMLASchedMeta
  • get_mla_metadata
  • flash_mla_with_kvcache
  • flash_attn_varlen_func
  • flash_attn_varlen_qkvpacked_func
  • flash_attn_varlen_kvpacked_func
  • flash_mla_sparse_fwd

Benchmarks

Benchmarking script is available for this kernel. Run kernels benchmark kernels-community/flash-mla --version 1.

Downloads last month
659
mit
Supported hardwares new
CUDA
9.0a10.0f
GPU
B300
288GB
NVIDIA SXM
B200
192GB
NVIDIA SXM
H200
141GB
NVIDIA SXM
H100
80GB
GPU
H800
80GB
GPU
H20
96GB
OS
linux
Arch
x86_64aarch64