Blind A/B test of 40 Claude prompt codes finds only 7 shift reasoning
Author Samarth Bhamare tested 40 circulated Claude prompt codes and found that only 7 actually changed Claude's reasoning, while the other 33 merely altered its tone or style.
Score breakdown
Practitioners building Claude-based coding agents or prompt pipelines should prioritize rejection-logic prefixes like `/skeptic` and `L99` over additive "be more expert" instructions, which this study found produced no measurable reasoning improvement.
- 0140 Claude prompt codes were blind A/B tested by author Samarth Bhamare in a self-funded, single-rater study from March–April 2026.
- 02Only 7 of the 40 codes measurably changed Claude's reasoning; the other 33 changed only tone, confidence, or formatting.
- 03All 7 effective codes use rejection logic — telling Claude what framings to refuse — rather than additive instructions.
Samarth Bhamare conducted a self-funded, single-rater study from March–April 2026, testing 40 of the most-circulated Claude prompt codes against no-prefix baselines. Each code received n=12–20 runs with fresh context per run, fixed task batteries across coding, analysis, and writing, and blind ordering between run and rating. A 15,000-word companion PDF report with raw results by code is linked in the appendix of the gist.
The headline finding is that only 7 of the 40 codes produced measurable shifts in Claude's actual reasoning or decisions.
The headline finding is that only 7 of the 40 codes produced measurable shifts in Claude's actual reasoning or decisions. The other 33 produced stylistic changes — shorter responses, less hedging, more confident tone — without altering the underlying logic. Bhamare notes this isn't worthless, but it isn't the cognitive "unlock" that viral marketing around these codes implies. The key structural insight: every effective code uses rejection logic (telling Claude what framings to refuse) rather than additive instructions (telling Claude to be more of something).
The top performer was `/skeptic`, which tells Claude to challenge the premise of a question before answering. On 14 "should I do X" decisions where the obvious answer was wrong, `/skeptic` caught the flawed premise in 11 of 14 cases (79%), versus 2 of 14 (14%) for baseline Claude — a 5.5× improvement and the largest single delta in the dataset. `L99` forced a single committed answer instead of "it depends" essays, achieving commitment in 11 of 12 binary decision questions (versus 2 of 12 at baseline), though correctness when committed was 73%, meaning confident-but-wrong outputs are a real risk. `/blindspots` surfaced at least one material hidden assumption in 9 of 11 strategy and architecture questions.
Key facts
- 0140 Claude prompt codes were blind A/B tested by author Samarth Bhamare in a self-funded, single-rater study from March–April 2026.
- 02Only 7 of the 40 codes measurably changed Claude's reasoning; the other 33 changed only tone, confidence, or formatting.
- 03All 7 effective codes use rejection logic — telling Claude what framings to refuse — rather than additive instructions.
- 04/skeptic caught a wrong premise in 11 of 14 cases (79%) vs. 2 of 14 (14%) for baseline — a 5.5× improvement and the largest delta in the dataset.
- 05L99 forced a single committed answer in 11 of 12 binary decision questions vs. 2 of 12 at baseline, with 73% correctness when committed.