Apr 22, 2026·1 min readNew Models & Releases

Transformers v5.6.0 adds four new models and serve enhancements

The `huggingface/transformers` `v5.6.0` release, authored by vasqu, adds four new models — OpenAI Privacy Filter, QianfanOCR, SAM3-LiteText, and SLANet — alongside major enhancements to the `transformers serve` command and a breaking change to kernel function registration.

GitHub: huggingface/transformers·vasqu

Read at source

Composite

5.5

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Teams building agentic pipelines should audit any custom Attention module code for `self.rotary_fn(...)` calls before upgrading to `v5.6.0`, and can immediately leverage the new `/v1/completions` endpoint and multimodal serve support for production deployments.

01OpenAI Privacy Filter is a bidirectional token-classification model predicting over 8 privacy-related output categories per token for PII detection and masking
02QianfanOCR is a 4B-parameter document intelligence model from Baidu featuring a 'Layout-as-Thought' capability for complex document parsing
03SAM3-LiteText reduces text encoder parameters by up to 88% via knowledge distillation using a MobileCLIP-based encoder, while keeping the ViT-H image encoder

Summary— our read of the original

The `v5.6.0` release of `huggingface/transformers`, published by vasqu, ships four new model integrations. The OpenAI Privacy Filter is a bidirectional token-classification model for PII detection and masking, designed for high-throughput on-premises data sanitization. It processes input in a single forward pass and uses a constrained Viterbi procedure to decode spans, predicting probability distributions over 8 privacy-related categories per token. QianfanOCR, developed by Baidu, is a 4B-parameter end-to-end document intelligence model that skips traditional multi-stage OCR pipelines entirely. Its "Layout-as-Thought" capability generates structured layout representations before producing final outputs, enabling tasks like table extraction, chart understanding, and document question answering within a single unified model.

SLANet and SLANet_plus, from Baidu's PaddlePaddle Vision Team, are lightweight table structure recognition models built on the CPU-friendly PP-LCNet backbone, a CSP-PAN feature fusion module, and an SLA Head decoder.

SAM3-LiteText replaces the original SAM3 text encoder (353M parameters) with a compact MobileCLIP-based encoder trained via knowledge distillation, reducing text encoder parameter count by up to 88% while preserving segmentation performance and keeping the SAM3 ViT-H image encoder intact. SLANet and SLANet_plus, from Baidu's PaddlePaddle Vision Team, are lightweight table structure recognition models built on the CPU-friendly PP-LCNet backbone, a CSP-PAN feature fusion module, and an SLA Head decoder.

On the serving side, `transformers serve` received a new `/v1/completions` endpoint for legacy OpenAI-compatible text completion, multimodal support for audio and video inputs, improved tool-calling via `parse_response`, proper forwarding of `tool_calls`/`tool_call_id` fields, a 400 error response when a model-pinned server receives a mismatched request, and updated documentation covering options like `--compile` and `--model-timeout`. The release also includes a breaking change: the internal `rotary_fn` is no longer registered as a hidden kernel function, so any custom Attention module code calling `self.rotary_fn(...)` must be updated to invoke the function directly.

Key facts

01OpenAI Privacy Filter is a bidirectional token-classification model predicting over 8 privacy-related output categories per token for PII detection and masking
02QianfanOCR is a 4B-parameter document intelligence model from Baidu featuring a 'Layout-as-Thought' capability for complex document parsing
03SAM3-LiteText reduces text encoder parameters by up to 88% via knowledge distillation using a MobileCLIP-based encoder, while keeping the ViT-H image encoder
04SLANet is a CPU-friendly table structure recognition model from Baidu PaddlePaddle using the PP-LCNet backbone and CSP-PAN feature fusion
05`transformers serve` gains a `/v1/completions` endpoint, multimodal audio/video support, and a 400 error on model mismatch
06Breaking change: `rotary_fn` is no longer registered as a hidden kernel function; code calling `self.rotary_fn(...)` in Attention modules must be updated

Topics

#model-release #open-source #transformers #ocr #vision-language

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 22, 2026 · 19:13 UTC. How this works →

Transformers v5.6.0 adds four new models and serve enhancements

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics