CLI tool `dug` ranks files by relevance before sending bugs to Claude
`dug` is a CLI built by Ratish Jain that automatically ranks codebase files by relevance to a bug description using graph traversal, semantic embeddings, and a learning layer built from past fixes — eliminating the manual context-gathering step before handing off to Claude.
Score breakdown
The tool's learned history layer means file-ranking accuracy compounds over time from a team's actual debugging record — something stateless search tools like grep cannot do.
- 01Ratish Jain built `dug`, a CLI that automates file-context gathering before AI-assisted debugging.
- 02`dug init` indexes the codebase; `dug "your error"` outputs a ranked file list with relevance scores.
- 03Graph-based scoring awards +10 for a directly mentioned file, +8 for an importing file or a commit-matched file, and +2 for any recently modified file.
Ratish Jain describes building `dug`, a CLI tool designed to eliminate the manual archaeology of finding relevant files before using an AI coding assistant like Claude. The workflow is two commands: `dug init` indexes the codebase once, and `dug "your error"` outputs a ranked list of files with relevance scores and a suggested starting point, ready to paste into Claude.
First, a directed graph built during `dug init` connects FILE nodes to SYMBOL nodes (functions, classes), other FILE nodes (imports), and COMMIT nodes.
The ranking combines three independent signals. First, a directed graph built during `dug init` connects FILE nodes to SYMBOL nodes (functions, classes), other FILE nodes (imports), and COMMIT nodes. At query time, graph traversal assigns points based on whether a file is directly mentioned (+10), imports a mentioned file (+8), was modified in a commit matching the error (+8), or was modified in any recent commit (+2). Second, when graph signals fail — such as when the error text contains no file names — semantic search takes over: every function body is embedded using `fastembed` (ONNX-based, fully local, no PyTorch) with the `sentence-transformers/all-MiniLM-L6-v2` model (384 dimensions) and stored in LanceDB; cosine similarity against the query adds up to +5 points.
The third and differentiating layer is learned history. After a fix, `dug solved` prompts the user to confirm which files actually contained the bug, saving the result to `.dug/history.json`. Future queries run `find_similar_past_bugs()`, which scores past entries using a blend of character-level `SequenceMatcher` and word-level Jaccard similarity (weighted 0.6), shared file/symbol overlap (0.25), and error type match (0.2). Matching past bugs boost file scores by up to +6 points scaled by similarity, and a separate error-pattern boost adds up to +3 points when a file has appeared repeatedly in fixes for the same error class. The article notes the tool is not ML training in the PyTorch sense, but weighted frequency built from a team's actual debugging history.
Key facts
- 01Ratish Jain built `dug`, a CLI that automates file-context gathering before AI-assisted debugging.
- 02`dug init` indexes the codebase; `dug "your error"` outputs a ranked file list with relevance scores.
- 03Graph-based scoring awards +10 for a directly mentioned file, +8 for an importing file or a commit-matched file, and +2 for any recently modified file.
- 04Semantic search uses `fastembed` (ONNX, fully local) with `sentence-transformers/all-MiniLM-L6-v2` (384 dimensions) stored in LanceDB, adding up to +5 points.
- 05`dug solved` records which files actually contained the fix in `.dug/history.json`, enabling a learning layer.
- 06Past-bug similarity scoring blends character-level SequenceMatcher and word-level Jaccard (weight 0.6), signal overlap (0.25), and error-type match (0.2), boosting file scores by up to +6 points.
- 07A separate error-pattern boost adds up to +3 points when a file repeatedly appears in fixes for the same error class.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 16, 2026 · 23:11 UTC. How this works →