Real-time AI deposition analysis with Deepgram and Claude
John Mahoney details how Courtroom AI uses Deepgram Nova-3 and Claude Haiku-4-5 to deliver structured legal analysis of live deposition testimony to attorneys in roughly 4 seconds per segment.
Score breakdown
Developers building real-time AI legal or compliance tools can directly apply these three production fixes — token budget diagnosis via `finish_reason`, WebSocket keepalive patterns, and replacing hallucinated citations with grounded API lookups — to avoid the same costly failures.
- 01Live deposition transcripts run 12K–25K words over two hours and are processed via Deepgram Nova-3 streaming ASR.
- 02Claude Haiku-4-5 returns a single 12-key JSON schema per testimony segment covering medical accuracy, Daubert vulnerabilities, prior-testimony contradictions, evasion patterns, FRE triggers, and more.
- 03Batching all analysis into one Claude call achieves ~4-second latency; 12 sequential calls would have taken ~18 seconds.
John Mahoney describes the architecture and production lessons behind Courtroom AI, a tool that listens to live legal depositions and pushes structured analysis to an attorney's browser tab in near real time. Two-hour witness depositions can produce 12,000–25,000 words of transcript. The system routes audio through Deepgram Nova-3 for streaming ASR, into a single Node.js WebSocket server, and then to Claude Haiku-4-5, which returns a strict 12-key JSON object per testimony segment. That object covers medical accuracy scores, Daubert vulnerability scores, prior-testimony inconsistencies, suggested cross-examination questions, tort elements (duty, breach, causation, damages), admission and evasion detection, FRE foundation triggers, chart contradictions, and literature hits. The entire round trip lands in roughly 4 seconds — achieved by batching all 12 analyses into one Claude call rather than 12 sequential calls, which would have added roughly 18 seconds of latency at Haiku's average response time.
First, dense testimony segments routinely required 6,000–8,000 output tokens, but Haiku-4-5's default response budget is 4,096 tokens.
Three production failures shaped the final design. First, dense testimony segments routinely required 6,000–8,000 output tokens, but Haiku-4-5's default response budget is 4,096 tokens. Truncated output surfaced as JSON parse errors rather than a clear `max_tokens` signal, costing hours of debugging before the team raised `max_tokens` to 8192. Second, Railway's Cloudflare layer closes idle WebSocket connections after 100 seconds — easily exceeded when a witness pauses to review a document. The fix was a client-side ping every 25 seconds using a `useEffect` interval, with the server echoing a simple pong. Third, early prompts asked Claude to generate PubMed citations with PMIDs and author names to counter "in my experience" expert claims; the citations were convincing but largely fabricated. Invoking the Mata v. Avianca precedent as a cautionary parallel, Mahoney replaced this with a workflow where Claude generates only a PubMed search query, which is then executed against the real NCBI API.
Key facts
- 01Live deposition transcripts run 12K–25K words over two hours and are processed via Deepgram Nova-3 streaming ASR.
- 02Claude Haiku-4-5 returns a single 12-key JSON schema per testimony segment covering medical accuracy, Daubert vulnerabilities, prior-testimony contradictions, evasion patterns, FRE triggers, and more.
- 03Batching all analysis into one Claude call achieves ~4-second latency; 12 sequential calls would have taken ~18 seconds.
- 04