DeepSeek V4 Pro beats Opus 4.7 with a tool-calling repair layer
Ahmad Awais, CEO of CommandCode.ai, built a lightweight "validate-then-repair" tool-input repair layer in their open-source AI CLI that allowed DeepSeek V4 Pro to outperform Opus 4.7 in 6 out of 10 internal evaluations.
Score breakdown
The finding that open-model tool-calling failures are largely harness and contract issues — fixable with a repair layer rather than a more expensive model — is the basis for DeepSeek V4 Pro matching or beating Opus 4.7 in the majority of CommandCode's internal evaluations.
- 01Ahmad Awais is CEO of CommandCode.ai and has published 300+ open-source repositories.
- 02He received early GPT-3 access in July 2020 and began building a code-suggestion CLI, predating GitHub Copilot by over a year.
- 03Analyzing failure patterns across billions of tokens led to a shift from rigid validation to a 'validate-then-repair' approach.
In a Latent Space interview, Ahmad Awais — CEO of CommandCode.ai, former VP of DevRel at RapidAPI, and author of 300+ open-source repositories — traces the origins of CommandCode back to a 2020 CLI project and early GPT-3 access, during which he set out to suggest the next line of code more than a year before GitHub Copilot launched. That six-year-old codebase eventually evolved into CommandCode, built around the conviction that a coding agent is the only type of agent needed.
Central to the discussion is a meta-neuro-symbolic framework called "Taste," and a specific engineering insight about tool-calling reliability.
Central to the discussion is a meta-neuro-symbolic framework called "Taste," and a specific engineering insight about tool-calling reliability. By studying failure patterns across billions of tokens, Awais identified what he calls the "Tool Confusion" phenomenon in open models and concluded that most of these failures are harness and contract issues rather than fundamental model limitations. This led to a "validate-then-repair" architecture — a lightweight tool-input repair layer — that uses targeted repairs, semantic hints, and transparent feedback to fix malformed tool calls on the fly. The result: DeepSeek V4 Pro outperformed Opus 4.7 in 6 out of 10 internal evaluations, without changing the underlying LLM. The interview also touches on CommandCode's roadmap, including plans to open-source the project.
Key facts
- 01Ahmad Awais is CEO of CommandCode.ai and has published 300+ open-source repositories.
- 02He received early GPT-3 access in July 2020 and began building a code-suggestion CLI, predating GitHub Copilot by over a year.
- 03Analyzing failure patterns across billions of tokens led to a shift from rigid validation to a 'validate-then-repair' approach.
- 04The repair layer is a lightweight tool-input repair layer built into their open-source AI CLI.
- 05DeepSeek V4 Pro outperformed Opus 4.7 in 6 out of 10 internal evaluations using this approach.
- 06The core finding is that most open-model tool-calling failures are harness/contract issues, not true capability gaps.
- 07CommandCode is built around a meta-neuro-symbolic framework called 'Taste,' and open-sourcing the project is on the roadmap.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →