AI agent audits 200 shadcn-ui PRs, finds 69 duplicate submissions
Chinmay Mhatre built an AI-powered PR audit tool that analyzed 200 pull requests in the shadcn-ui/ui repo and flagged 69 valid redundancies by detecting "goal duplication" across disjoint files — not just copy-pasted code.
Score breakdown
Chinmay Mhatre built an AI-powered PR auditing system to combat the growing problem of duplicate pull requests flooding high-traffic open source repos. Running it against 200 recent PRs in the shadcn-ui/ui repository, the tool flagged 69 valid redundancies — including cases where three separate PRs modified completely different files (`apps/www/config/docs.ts`, `apps/www/lib/utils.ts`, and `apps/www/registry/registry.json`) but were all targeting the same broken `/blocks` page link. The system classifies matches into three buckets: SHADOW (exact duplicate fix), SUPERSET (broader fix covering a narrower one), and COMPETING (different approaches to the same functional outcome).
Chinmay Mhatre set out to tackle a growing pain point for open source maintainers: AI coding agents flooding high-traffic repositories with duplicate PRs that address identical problems in superficially different ways. Using the shadcn-ui/ui repo as a test bed, he built an audit system that analyzed 200 recent pull requests and surfaced 69 valid redundancies — far more than simple code clone detection would have caught. The key insight is that the system targets "Goal Duplication," evaluating the **architectural intent** of a PR rather than its literal code changes. Matches are classified as SHADOW (identical fix for the same regression), SUPERSET (a broader fix that makes a narrower one redundant), or COMPETING (two divergent strategies solving the same functional outcome).
The cleaned diff is then embedded using Gemini embedding models and indexed into Upstash Vector, where each new PR is compared against the 8 most similar candidates.
The pipeline works in three stages. First, PR diffs are aggressively compressed to stay within free-tier token budgets: stripping SVGs, lockfiles, and documentation; removing comments and unchanged imports; and, if the result still exceeds 1,500 characters, extracting only the modified `+` and `-` hunks. The cleaned diff is then embedded using Gemini embedding models and indexed into Upstash Vector, where each new PR is compared against the 8 most similar candidates. A critical early problem was structural bias — the vector search flagged dissimilar JSON additions as duplicates purely because of shape similarity — which Mhatre corrected by reprioritizing literal values like IDs and URLs over syntax. The final stage passes top candidates through a deep reasoning LLM loop (routing between Gemini and Llama to handle rate limits) to determine whether the authors' underlying intents actually overlap. Real-world matches found included three concurrent SHADOW duplicates adding `aria-labels` to Data Tables (PRs `#10421` and `#10402`), and a COMPETING pair (`#10158` vs. `#10133`) where one PR used a global CSS fix and another used a component-level fix for the exact same iOS date input rendering bug.