Jun 8, 2026·1 min readAgentic Coding

LangGraph tool calling breaks in subtle ways when swapping LLM providers

A developer running a six-node LangGraph data analysis agent in production discovered four categories of tool-calling incompatibilities when swapping between gpt-5.4, GPT-OSS 120B, and DeepSeek V4-Pro on OpenAI-compatible endpoints.

r/LangChain·u/whyleaving

Read at source

Composite

4.9

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

The post demonstrates that `bind_tools` abstraction holds for one-shot structured output but breaks in at least four concrete ways inside stateful LangGraph loops, meaning production multi-provider agent deployments require explicit normalization logic that the framework does not provide out of the box.

01Six-node LangGraph agent uses schema lookup, SQL execution, and result formatting tools
02Originally built on gpt-5.4; tested against GPT-OSS 120B and DeepSeek V4-Pro on GMI Cloud
03DeepSeek V4-Pro occasionally splits parallel tool calls across two messages instead of one AIMessage

Summary— our read of the original

u/whyleaving details a production LangGraph agent with a six-node graph performing schema lookup, SQL execution, and result formatting. Originally built on gpt-5.4 with `bind_tools` and parallel tool calls, the agent was migrated to cheaper alternatives — GPT-OSS 120B and DeepSeek V4-Pro on GMI Cloud — using the standard `ChatOpenAI(base_url=...)` pattern. While the surface API looks identical, the author found four concrete failure modes that only surface inside a stateful LangGraph loop.

Third, `tool_choice="required"` is honored by gpt-5.4 but silently ignored on smaller models in roughly 5–8% of runs, returning a plain `AIMessage` with no tool call.

First, parallel tool calls: gpt-5.4 returns a single `AIMessage` with multiple `tool_calls` populated, but DeepSeek V4-Pro occasionally splits two calls across two separate messages — a shape that `ToolNode` handles but a custom reducer did not. Second, tool call IDs: some endpoints return OpenAI-style `call_id`s, some return short hashes, and some return empty IDs under load; `MessagesState` relies on these IDs to pair `ToolMessage` replies, and an empty ID causes a 400 error on the next turn. Third, `tool_choice="required"` is honored by gpt-5.4 but silently ignored on smaller models in roughly 5–8% of runs, returning a plain `AIMessage` with no tool call. Fourth, nested object schemas are occasionally returned with args as a JSON-encoded string rather than a parsed dict, at a rate of about 1 in 30 calls on some hosts.

The author's current mitigation is a thin wrapper around the model node that fills in missing `call_id`s, parses stringified args, and retries once with a stricter system prompt when no tool call is returned. The open problem is p95 latency: p50 is comparable across providers, but p95 diverges enough to threaten SLO compliance, and no provider-agnostic solution has been found.

Key facts

01Six-node LangGraph agent uses schema lookup, SQL execution, and result formatting tools
02Originally built on gpt-5.4; tested against GPT-OSS 120B and DeepSeek V4-Pro on GMI Cloud
03DeepSeek V4-Pro occasionally splits parallel tool calls across two messages instead of one AIMessage
04Some OpenAI-compatible endpoints return empty tool call IDs under load, causing 400 errors on the next turn
05`tool_choice="required"` is silently ignored on smaller models in roughly 5–8% of runs
06Nested object args come back as a JSON-encoded string instead of a dict about 1 in 30 calls on some hosts
07p95 latency diverges significantly across providers; p50 is comparable

Topics

#tool-use #agent-framework #langgraph #production-deployment #llm-compatibility

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →

Jun 8, 2026·1 min readAgentic Coding

LangGraph tool calling breaks in subtle ways when swapping LLM providers

r/LangChain·u/whyleaving

Read at source

Composite

4.9

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Six-node LangGraph agent uses schema lookup, SQL execution, and result formatting tools
02Originally built on gpt-5.4; tested against GPT-OSS 120B and DeepSeek V4-Pro on GMI Cloud
03DeepSeek V4-Pro occasionally splits parallel tool calls across two messages instead of one AIMessage

Summary— our read of the original

Third, `tool_choice="required"` is honored by gpt-5.4 but silently ignored on smaller models in roughly 5–8% of runs, returning a plain `AIMessage` with no tool call.

Key facts

01Six-node LangGraph agent uses schema lookup, SQL execution, and result formatting tools
02Originally built on gpt-5.4; tested against GPT-OSS 120B and DeepSeek V4-Pro on GMI Cloud
03DeepSeek V4-Pro occasionally splits parallel tool calls across two messages instead of one AIMessage
04Some OpenAI-compatible endpoints return empty tool call IDs under load, causing 400 errors on the next turn
05`tool_choice="required"` is silently ignored on smaller models in roughly 5–8% of runs
06Nested object args come back as a JSON-encoded string instead of a dict about 1 in 30 calls on some hosts
07p95 latency diverges significantly across providers; p50 is comparable

Topics

#tool-use #agent-framework #langgraph #production-deployment #llm-compatibility

Methodology

Score breakdown

Key facts

Topics

More in Agentic Coding.

Score breakdown

Key facts

Topics

More in Agentic Coding.