Transformers documentation

SLANet

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v5.6.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

This model was released on 2025-03-07 and added to Hugging Face Transformers on 2026-04-22.

SLANet

PyTorch

Overview

SLANet and SLANet_plus are part of a series of dedicated lightweight models for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. For more details about the SLANet series model, please refer to the official documentation.

Model Architecture

SLANet is a table structure recognition model developed by Baidu PaddlePaddle Vision Team. The model significantly improves the accuracy and inference speed of table structure recognition by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information.

Usage

Single input inference

The example below demonstrates how to detect text with SLANet using the AutoModel.

AutoModel
from io import BytesIO

import httpx
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForTableRecognition

model_path="PaddlePaddle/SLANet_plus_safetensors"
model = AutoModelForTableRecognition.from_pretrained(model_path, device_map="auto")
image_processor = AutoImageProcessor.from_pretrained(model_path)

image = Image.open(BytesIO(httpx.get(image_url).content))
inputs = image_processor(images=image, return_tensors="pt").to(model.device)
outputs = model(**inputs)

results = image_processor.post_process_table_recognition(outputs)

print(result['structure'])
print(result['structure_score'])

SLANetConfig

class transformers.SLANetConfig

< >

( transformers_version: str | None = None architectures: list[str] | None = None output_hidden_states: bool | None = False return_dict: bool | None = True dtype: typing.Union[str, ForwardRef('torch.dtype'), NoneType] = None chunk_size_feed_forward: int = 0 is_encoder_decoder: bool = False id2label: dict[int, str] | dict[str, str] | None = None label2id: dict[str, int] | dict[str, str] | None = None problem_type: typing.Optional[typing.Literal['regression', 'single_label_classification', 'multi_label_classification']] = None post_conv_out_channels: int = 96 out_channels: int = 50 hidden_size: int = 256 max_text_length: int = 500 backbone_config: dict | transformers.configuration_utils.PreTrainedConfig | None = None hidden_act: str = 'hardswish' csp_kernel_size: int = 5 csp_num_blocks: int = 1 )

Parameters

  • post_conv_out_channels (int, optional, defaults to 96) — Number of output channels for the post-encoder convolution layer.
  • out_channels (int, optional, defaults to 50) — Vocabulary size for the table structure token prediction head, i.e., the number of distinct structure tokens the model can predict.
  • hidden_size (int, optional, defaults to 256) — Dimensionality of the hidden states in the attention GRU cell and the structure/location prediction heads.
  • max_text_length (int, optional, defaults to 500) — Maximum number of autoregressive decoding steps (tokens) for the structure and location decoder.
  • backbone_config (Union[dict, ~configuration_utils.PreTrainedConfig], optional) — The configuration of the backbone model.
  • hidden_act (str, optional, defaults to hardswish) — The non-linear activation function (function or string) in the decoder. For example, "gelu", "relu", "silu", etc.
  • csp_kernel_size (int, optional, defaults to 5) — The kernel size of the Cross Stage Partial (CSP) layer.
  • csp_num_blocks (int, optional, defaults to 1) — Number of blocks within the Cross Stage Partial (CSP) layer.

This is the configuration class to store the configuration of a SlanetModel. It is used to instantiate a Slanet model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the PaddlePaddle/SLANet_plus_safetensors

Configuration objects inherit from PreTrainedConfig and can be used to control the model outputs. Read the documentation from PreTrainedConfig for more information.

SLANetForTableRecognition

class transformers.SLANetForTableRecognition

< >

( config: SLANetConfig )

Parameters

  • config (SLANetConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

SLANet Table Recognition model for table recognition tasks. Wraps the core SLANetPreTrainedModel and returns outputs compatible with the Transformers table recognition API.

This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

forward

< >

( pixel_values: FloatTensor **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) SLANetForTableRecognitionOutput or tuple(torch.FloatTensor)

Parameters

  • pixel_values (torch.FloatTensor of shape (batch_size, num_channels, image_size, image_size)) — The tensors corresponding to the input images. Pixel values can be obtained using SLANeXtImageProcessor. See SLANeXtImageProcessor.__call__() for details (processor_class uses SLANeXtImageProcessor for processing images).

Returns

SLANetForTableRecognitionOutput or tuple(torch.FloatTensor)

A SLANetForTableRecognitionOutput or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (SLANetConfig) and inputs.

The SLANetForTableRecognition forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

  • last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) — Sequence of hidden-states at the output of the last layer of the model.

  • hidden_states (tuple[torch.FloatTensor, ...], optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) — Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.

  • head_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) — Hidden-states of the SLANetSLAHead at each prediction step, varies up to max self.config.max_text_length states (depending on early exits).

  • head_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) — Attentions of the SLANetSLAHead at each prediction step, varies up to max self.config.max_text_length attentions (depending on early exits).

SLANetBackbone

class transformers.SLANetBackbone

< >

( config: SLANetConfig )

forward

< >

( hidden_states: FloatTensor **kwargs: typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs] ) BaseModelOutputWithNoAttention or tuple(torch.FloatTensor)

Parameters

  • hidden_states (torch.FloatTensor) — input to the layer of shape `(batch, seq_len, embed_dim)

Returns

BaseModelOutputWithNoAttention or tuple(torch.FloatTensor)

A BaseModelOutputWithNoAttention or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration (SLANetConfig) and inputs.

The SLANetBackbone forward method, overrides the __call__ special method.

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

  • last_hidden_state (torch.FloatTensor of shape (batch_size, num_channels, height, width)) — Sequence of hidden-states at the output of the last layer of the model.

  • hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) — Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, num_channels, height, width).

    Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.

SLANetSLAHead

class transformers.SLANetSLAHead

< >

( config: dict | None = None **kwargs )

Update on GitHub