LangGraph tool calling breaks in subtle ways when swapping LLM providers
A developer running a six-node LangGraph data analysis agent in production discovered four categories of tool-calling incompatibilities when swapping between gpt-5.4, GPT-OSS 120B, and DeepSeek V4-Pro on OpenAI-compatible endpoints.
Score breakdown
The post demonstrates that `bind_tools` abstraction holds for one-shot structured output but breaks in at least four concrete ways inside stateful LangGraph loops, meaning production multi-provider agent deployments require explicit normalization logic that the framework does not provide out of the box.
- 01Six-node LangGraph agent uses schema lookup, SQL execution, and result formatting tools
- 02Originally built on gpt-5.4; tested against GPT-OSS 120B and DeepSeek V4-Pro on GMI Cloud
- 03DeepSeek V4-Pro occasionally splits parallel tool calls across two messages instead of one AIMessage
u/whyleaving details a production LangGraph agent with a six-node graph performing schema lookup, SQL execution, and result formatting. Originally built on gpt-5.4 with `bind_tools` and parallel tool calls, the agent was migrated to cheaper alternatives — GPT-OSS 120B and DeepSeek V4-Pro on GMI Cloud — using the standard `ChatOpenAI(base_url=...)` pattern. While the surface API looks identical, the author found four concrete failure modes that only surface inside a stateful LangGraph loop.
Third, `tool_choice="required"` is honored by gpt-5.4 but silently ignored on smaller models in roughly 5–8% of runs, returning a plain `AIMessage` with no tool call.
First, parallel tool calls: gpt-5.4 returns a single `AIMessage` with multiple `tool_calls` populated, but DeepSeek V4-Pro occasionally splits two calls across two separate messages — a shape that `ToolNode` handles but a custom reducer did not. Second, tool call IDs: some endpoints return OpenAI-style `call_id`s, some return short hashes, and some return empty IDs under load; `MessagesState` relies on these IDs to pair `ToolMessage` replies, and an empty ID causes a 400 error on the next turn. Third, `tool_choice="required"` is honored by gpt-5.4 but silently ignored on smaller models in roughly 5–8% of runs, returning a plain `AIMessage` with no tool call. Fourth, nested object schemas are occasionally returned with args as a JSON-encoded string rather than a parsed dict, at a rate of about 1 in 30 calls on some hosts.
The author's current mitigation is a thin wrapper around the model node that fills in missing `call_id`s, parses stringified args, and retries once with a stricter system prompt when no tool call is returned. The open problem is p95 latency: p50 is comparable across providers, but p95 diverges enough to threaten SLO compliance, and no provider-agnostic solution has been found.
Key facts
- 01Six-node LangGraph agent uses schema lookup, SQL execution, and result formatting tools
- 02Originally built on gpt-5.4; tested against GPT-OSS 120B and DeepSeek V4-Pro on GMI Cloud
- 03DeepSeek V4-Pro occasionally splits parallel tool calls across two messages instead of one AIMessage
- 04Some OpenAI-compatible endpoints return empty tool call IDs under load, causing 400 errors on the next turn
- 05`tool_choice="required"` is silently ignored on smaller models in roughly 5–8% of runs
- 06Nested object args come back as a JSON-encoded string instead of a dict about 1 in 30 calls on some hosts
- 07p95 latency diverges significantly across providers; p50 is comparable
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →