MCP server pre-publish checklist targets invisible agent failures
The post by pengspirit outlines a 10-point checklist for MCP server readiness — enforced by the open-source CLI `mcp-probe` — covering tool descriptions, argument schemas, mutation legibility, and install metadata before publishing.
Score breakdown
The checklist and `mcp-probe` score expose a class of MCP server defects — ambiguous tool descriptions, missing argument metadata, and silent `initialize` drops — that pass standard connectivity tests but cause agents to pick wrong tools or hallucinate arguments at runtime.
- 01Most MCP servers fail at least three of the ten pre-publish checks, and failures are invisible until an agent misbehaves at runtime.
- 02The single most common failure is thin tool descriptions — even the five official Anthropic reference servers cap at 60/100 on description quality.
- 03MCP Inspector confirms connectivity but cannot verify agent usability; a server can pass Inspector and still be functionally unpublishable.
pengspirit's post reframes MCP server quality around agent usability rather than basic connectivity. While tools like MCP Inspector confirm that a server connects and lists tools, the post argues that is necessary but not sufficient — a server can pass Inspector and still be functionally unpublishable if a model cannot distinguish its tools from similarly named ones in other servers. The ten checks cover: clean transport connection (stdio or HTTP), complete tool/resource/prompt listing after `initialize`, no `initialize` timeout (large tool lists can be silently dropped), real tool descriptions (not restated names), no naming collisions, fully described arguments with required fields and enumerated enums, legible mutations (tools that write/delete/charge must say so), input validation that returns useful errors, explicit enum and shape constraints, and complete install metadata including a `server.json` for MCP Registry discovery.
A passing server clears ~80 out of 100; the official Anthropic reference servers sit at 60 (failing on description quality); a typical first-draft community server lands in the 40s.
The post introduces `mcp-probe`, an open-source CLI that automates all ten checks and scores servers across five axes: description quality, enum/shape correctness, mutation legibility, anti-"restate the name" clauses, and distribution metadata. A passing server clears ~80 out of 100; the official Anthropic reference servers sit at 60 (failing on description quality); a typical first-draft community server lands in the 40s. The tool can be run with `npx @incultnitollc/mcp-probe score "node ./your-server.js"` and wired into CI with a `--fail-under 80` flag so the exit code gates the publish and prevents regression. The post singles out rewriting tool descriptions as the highest-leverage single fix — the one change that moves the most servers across the publishable line and the one most commonly skipped.
Key facts
- 01Most MCP servers fail at least three of the ten pre-publish checks, and failures are invisible until an agent misbehaves at runtime.
- 02The single most common failure is thin tool descriptions — even the five official Anthropic reference servers cap at 60/100 on description quality.
- 03MCP Inspector confirms connectivity but cannot verify agent usability; a server can pass Inspector and still be functionally unpublishable.
- 04The open-source CLI `mcp-probe` scores servers 0–100 across five axes; a passing server clears ~80, while typical first-draft community servers land in the 40s.
- 05`mcp-probe` can be wired into CI with `--fail-under 80` so the exit code gates the publish and prevents regression.
- 06Naming collisions (e.g., `create_issue` exists in dozens of servers) cause the model to guess; namespacing or specifying is recommended.
- 07Install metadata — package name, runnable example, README, and `server.json` — is required for MCP Registry discovery.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 15, 2026 · 11:57 UTC. How this works →