Transformers v5.6.0 adds four new models and serve enhancements
The `huggingface/transformers` `v5.6.0` release, authored by vasqu, adds four new models — OpenAI Privacy Filter, QianfanOCR, SAM3-LiteText, and SLANet — alongside major enhancements to the `transformers serve` command and a breaking change to kernel function registration.
Score breakdown
Teams building agentic pipelines should audit any custom Attention module code for `self.rotary_fn(...)` calls before upgrading to `v5.6.0`, and can immediately leverage the new `/v1/completions` endpoint and multimodal serve support for production deployments.
- 01OpenAI Privacy Filter is a bidirectional token-classification model predicting over 8 privacy-related output categories per token for PII detection and masking
- 02QianfanOCR is a 4B-parameter document intelligence model from Baidu featuring a 'Layout-as-Thought' capability for complex document parsing
- 03SAM3-LiteText reduces text encoder parameters by up to 88% via knowledge distillation using a MobileCLIP-based encoder, while keeping the ViT-H image encoder
The `v5.6.0` release of `huggingface/transformers`, published by vasqu, ships four new model integrations. The OpenAI Privacy Filter is a bidirectional token-classification model for PII detection and masking, designed for high-throughput on-premises data sanitization. It processes input in a single forward pass and uses a constrained Viterbi procedure to decode spans, predicting probability distributions over 8 privacy-related categories per token. QianfanOCR, developed by Baidu, is a 4B-parameter end-to-end document intelligence model that skips traditional multi-stage OCR pipelines entirely. Its "Layout-as-Thought" capability generates structured layout representations before producing final outputs, enabling tasks like table extraction, chart understanding, and document question answering within a single unified model.
SLANet and SLANet_plus, from Baidu's PaddlePaddle Vision Team, are lightweight table structure recognition models built on the CPU-friendly PP-LCNet backbone, a CSP-PAN feature fusion module, and an SLA Head decoder.
SAM3-LiteText replaces the original SAM3 text encoder (353M parameters) with a compact MobileCLIP-based encoder trained via knowledge distillation, reducing text encoder parameter count by up to 88% while preserving segmentation performance and keeping the SAM3 ViT-H image encoder intact. SLANet and SLANet_plus, from Baidu's PaddlePaddle Vision Team, are lightweight table structure recognition models built on the CPU-friendly PP-LCNet backbone, a CSP-PAN feature fusion module, and an SLA Head decoder.
On the serving side, `transformers serve` received a new `/v1/completions` endpoint for legacy OpenAI-compatible text completion, multimodal support for audio and video inputs, improved tool-calling via `parse_response`, proper forwarding of `tool_calls`/`tool_call_id` fields, a 400 error response when a model-pinned server receives a mismatched request, and updated documentation covering options like `--compile` and `--model-timeout`. The release also includes a breaking change: the internal `rotary_fn` is no longer registered as a hidden kernel function, so any custom Attention module code calling `self.rotary_fn(...)` must be updated to invoke the function directly.
Key facts
- 01OpenAI Privacy Filter is a bidirectional token-classification model predicting over 8 privacy-related output categories per token for PII detection and masking
- 02QianfanOCR is a 4B-parameter document intelligence model from Baidu featuring a 'Layout-as-Thought' capability for complex document parsing
- 03SAM3-LiteText reduces text encoder parameters by up to 88% via knowledge distillation using a MobileCLIP-based encoder, while keeping the ViT-H image encoder