Apr 23, 2026·1 min readResearch Papers

Blind A/B test of 40 Claude prompt codes finds only 7 shift reasoning

Author Samarth Bhamare tested 40 circulated Claude prompt codes and found that only 7 actually changed Claude's reasoning, while the other 33 merely altered its tone or style.

Hacker News·samarth0211

Read at source

Composite

6.7

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Practitioners building Claude-based coding agents or prompt pipelines should prioritize rejection-logic prefixes like `/skeptic` and `L99` over additive "be more expert" instructions, which this study found produced no measurable reasoning improvement.

0140 Claude prompt codes were blind A/B tested by author Samarth Bhamare in a self-funded, single-rater study from March–April 2026.
02Only 7 of the 40 codes measurably changed Claude's reasoning; the other 33 changed only tone, confidence, or formatting.
03All 7 effective codes use rejection logic — telling Claude what framings to refuse — rather than additive instructions.

Summary— our read of the original

Samarth Bhamare conducted a self-funded, single-rater study from March–April 2026, testing 40 of the most-circulated Claude prompt codes against no-prefix baselines. Each code received n=12–20 runs with fresh context per run, fixed task batteries across coding, analysis, and writing, and blind ordering between run and rating. A 15,000-word companion PDF report with raw results by code is linked in the appendix of the gist.

The headline finding is that only 7 of the 40 codes produced measurable shifts in Claude's actual reasoning or decisions.

The headline finding is that only 7 of the 40 codes produced measurable shifts in Claude's actual reasoning or decisions. The other 33 produced stylistic changes — shorter responses, less hedging, more confident tone — without altering the underlying logic. Bhamare notes this isn't worthless, but it isn't the cognitive "unlock" that viral marketing around these codes implies. The key structural insight: every effective code uses rejection logic (telling Claude what framings to refuse) rather than additive instructions (telling Claude to be more of something).

The top performer was `/skeptic`, which tells Claude to challenge the premise of a question before answering. On 14 "should I do X" decisions where the obvious answer was wrong, `/skeptic` caught the flawed premise in 11 of 14 cases (79%), versus 2 of 14 (14%) for baseline Claude — a 5.5× improvement and the largest single delta in the dataset. `L99` forced a single committed answer instead of "it depends" essays, achieving commitment in 11 of 12 binary decision questions (versus 2 of 12 at baseline), though correctness when committed was 73%, meaning confident-but-wrong outputs are a real risk. `/blindspots` surfaced at least one material hidden assumption in 9 of 11 strategy and architecture questions.

Key facts

0140 Claude prompt codes were blind A/B tested by author Samarth Bhamare in a self-funded, single-rater study from March–April 2026.
02Only 7 of the 40 codes measurably changed Claude's reasoning; the other 33 changed only tone, confidence, or formatting.
03All 7 effective codes use rejection logic — telling Claude what framings to refuse — rather than additive instructions.
04/skeptic caught a wrong premise in 11 of 14 cases (79%) vs. 2 of 14 (14%) for baseline — a 5.5× improvement and the largest delta in the dataset.
05L99 forced a single committed answer in 11 of 12 binary decision questions vs. 2 of 12 at baseline, with 73% correctness when committed.
06/blindspots surfaced at least one material hidden assumption in 9 of 11 strategy/architecture questions.
07Each code was tested with n=12–20 runs, fresh context per run, and fixed task batteries across coding, analysis, and writing.

Topics

#prompt-engineering #benchmarks #reasoning #claude #empirical-study

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 23, 2026 · 11:04 UTC. How this works →

Apr 23, 2026·1 min readResearch Papers

Blind A/B test of 40 Claude prompt codes finds only 7 shift reasoning

Author Samarth Bhamare tested 40 circulated Claude prompt codes and found that only 7 actually changed Claude's reasoning, while the other 33 merely altered its tone or style.

Hacker News·samarth0211

Read at source

Composite

6.7

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

0140 Claude prompt codes were blind A/B tested by author Samarth Bhamare in a self-funded, single-rater study from March–April 2026.
02Only 7 of the 40 codes measurably changed Claude's reasoning; the other 33 changed only tone, confidence, or formatting.
03All 7 effective codes use rejection logic — telling Claude what framings to refuse — rather than additive instructions.

Summary— our read of the original

The headline finding is that only 7 of the 40 codes produced measurable shifts in Claude's actual reasoning or decisions.

Key facts

0140 Claude prompt codes were blind A/B tested by author Samarth Bhamare in a self-funded, single-rater study from March–April 2026.
02Only 7 of the 40 codes measurably changed Claude's reasoning; the other 33 changed only tone, confidence, or formatting.
03All 7 effective codes use rejection logic — telling Claude what framings to refuse — rather than additive instructions.
04/skeptic caught a wrong premise in 11 of 14 cases (79%) vs. 2 of 14 (14%) for baseline — a 5.5× improvement and the largest delta in the dataset.
05L99 forced a single committed answer in 11 of 12 binary decision questions vs. 2 of 12 at baseline, with 73% correctness when committed.
06/blindspots surfaced at least one material hidden assumption in 9 of 11 strategy/architecture questions.
07Each code was tested with n=12–20 runs, fresh context per run, and fixed task batteries across coding, analysis, and writing.

Topics

#prompt-engineering #benchmarks #reasoning #claude #empirical-study

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics