Apr 20, 2026·2 min readResearch Papers

Study maps 2,430 Claude Code tool picks across 20 categories

A systematic survey by Edwin Ong & Alex Vikati of amplifying.ai tracked 2,430 open-ended prompts to Claude Code across 3 models, 4 project types, and 20 tool categories, revealing that the agent's choices are converging into a de facto default stack.

Hacker News·lionkor

Read at source

Composite

7.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Tool vendors and developers should audit whether their preferred libraries appear in Claude Code's default stack, since the agent installs and commits code autonomously — meaning its training-data biases now directly influence which packages ship in new projects.

01Study issued 2,430 open-ended prompts to Claude Code CLI v2.1.39 across 3 models (Sonnet 4.5, Opus 4.5, Opus 4.6), 4 project types, and 20 tool categories
02Custom/DIY builds accounted for 252 of 2,073 primary picks (12%), making it the single most common 'recommendation'
03GitHub Actions captured 94% of CI/CD picks, shadcn/ui 90% of UI component picks, and Stripe 91% of payment picks

Summary— our read of the original

Researchers Edwin Ong & Alex Vikati at amplifying.ai conducted a systematic benchmark of Claude Code's tool selection behavior, issuing 2,430 open-ended prompts — such as "what should I use?" and "add user authentication" — against four greenfield repositories: a Next.js 14/TypeScript SaaS app (TaskFlow), a FastAPI/Python 3.11 API (DataPipeline), a Vite/React 18 SPA (InvoiceTracker), and a Node.js/TypeScript CLI (deployctl). Three models were tested — Sonnet 4.5, Opus 4.5, and Opus 4.6 — with three independent runs per model-repo combination, using a full `git checkout . && git clean -fd` between every prompt to ensure clean state. An LLM-based extraction pipeline (Claude Code subagents) identified the primary tool pick from each response, achieving an 85.3% extraction rate yielding 2,073 usable picks.

Several categories show near-monopoly behavior — GitHub Actions at 94% for CI/CD, shadcn/ui at 90% for UI components, and Stripe at 91% for payments.

The most striking finding is that Claude Code builds rather than buys: custom/DIY implementations accounted for 252 out of 2,073 primary picks (12%), making it the single most common "recommendation" across all 20 categories. Where the agent does reach for third-party tools, it converges sharply on a consistent default stack: Vercel, PostgreSQL, Stripe, Tailwind CSS, shadcn/ui, pnpm, GitHub Actions, Sentry, Resend, and Zustand, with stack-specific picks like Drizzle (JS) or SQLModel (Python) for ORMs, NextAuth.js for Next.js auth, and Vitest (JS) or pytest (Python) for testing. Several categories show near-monopoly behavior — GitHub Actions at 94% for CI/CD, shadcn/ui at 90% for UI components, and Stripe at 91% for payments. Notably, Redux and Express received zero primary picks across the entire study.

The research also examined consistency dimensions: all three models agreed on the top tool in 18 of 20 categories within the same ecosystem, with only Caching and Real-time showing genuine cross-ecosystem disagreement. Prompt phrasing had limited impact, with 76% average stability across 5 phrasings of the same category, while project context mattered more — the same category yielded different picks across repos (e.g., Vercel for Next.js vs. Railway for Python). The authors frame Claude Code as a new distribution channel where a model's training data may shape market share more than marketing budgets, making agent tool-selection behavior a form of competitive intelligence for vendors and developers alike.

Key facts

01Study issued 2,430 open-ended prompts to Claude Code CLI v2.1.39 across 3 models (Sonnet 4.5, Opus 4.5, Opus 4.6), 4 project types, and 20 tool categories
02Custom/DIY builds accounted for 252 of 2,073 primary picks (12%), making it the single most common 'recommendation'
03GitHub Actions captured 94% of CI/CD picks, shadcn/ui 90% of UI component picks, and Stripe 91% of payment picks
04All three models agreed on the top tool in 18 of 20 categories when compared within the same ecosystem
05Prompt phrasing had limited effect: 76% average stability across 5 phrasings of the same category
06Redux and Express received zero primary picks across the entire study
07The LLM-based extraction pipeline achieved an 85.3% extraction rate (2,073 identifiable primary picks out of 2,430 responses)

Topics

#benchmarks #agent-framework #tool-use #coding-assistant #developer-tools

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 20, 2026 · 13:29 UTC. How this works →

Apr 20, 2026·2 min readResearch Papers

Study maps 2,430 Claude Code tool picks across 20 categories

Hacker News·lionkor

Read at source

Composite

7.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Study issued 2,430 open-ended prompts to Claude Code CLI v2.1.39 across 3 models (Sonnet 4.5, Opus 4.5, Opus 4.6), 4 project types, and 20 tool categories
02Custom/DIY builds accounted for 252 of 2,073 primary picks (12%), making it the single most common 'recommendation'
03GitHub Actions captured 94% of CI/CD picks, shadcn/ui 90% of UI component picks, and Stripe 91% of payment picks

Summary— our read of the original

Several categories show near-monopoly behavior — GitHub Actions at 94% for CI/CD, shadcn/ui at 90% for UI components, and Stripe at 91% for payments.

Key facts

01Study issued 2,430 open-ended prompts to Claude Code CLI v2.1.39 across 3 models (Sonnet 4.5, Opus 4.5, Opus 4.6), 4 project types, and 20 tool categories
02Custom/DIY builds accounted for 252 of 2,073 primary picks (12%), making it the single most common 'recommendation'
03GitHub Actions captured 94% of CI/CD picks, shadcn/ui 90% of UI component picks, and Stripe 91% of payment picks
04All three models agreed on the top tool in 18 of 20 categories when compared within the same ecosystem
05Prompt phrasing had limited effect: 76% average stability across 5 phrasings of the same category
06Redux and Express received zero primary picks across the entire study
07The LLM-based extraction pipeline achieved an 85.3% extraction rate (2,073 identifiable primary picks out of 2,430 responses)

Topics

#benchmarks #agent-framework #tool-use #coding-assistant #developer-tools

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics