Apr 21, 2026·1 min readApplications & Use Cases

Real-time AI deposition analysis with Deepgram and Claude

John Mahoney details how Courtroom AI uses Deepgram Nova-3 and Claude Haiku-4-5 to deliver structured legal analysis of live deposition testimony to attorneys in roughly 4 seconds per segment.

Dev.to #claude·John Mahoney

Read at source

Composite

5.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers building real-time AI legal or compliance tools can directly apply these three production fixes — token budget diagnosis via `finish_reason`, WebSocket keepalive patterns, and replacing hallucinated citations with grounded API lookups — to avoid the same costly failures.

01Live deposition transcripts run 12K–25K words over two hours and are processed via Deepgram Nova-3 streaming ASR.
02Claude Haiku-4-5 returns a single 12-key JSON schema per testimony segment covering medical accuracy, Daubert vulnerabilities, prior-testimony contradictions, evasion patterns, FRE triggers, and more.
03Batching all analysis into one Claude call achieves ~4-second latency; 12 sequential calls would have taken ~18 seconds.

Summary— our read of the original

John Mahoney describes the architecture and production lessons behind Courtroom AI, a tool that listens to live legal depositions and pushes structured analysis to an attorney's browser tab in near real time. Two-hour witness depositions can produce 12,000–25,000 words of transcript. The system routes audio through Deepgram Nova-3 for streaming ASR, into a single Node.js WebSocket server, and then to Claude Haiku-4-5, which returns a strict 12-key JSON object per testimony segment. That object covers medical accuracy scores, Daubert vulnerability scores, prior-testimony inconsistencies, suggested cross-examination questions, tort elements (duty, breach, causation, damages), admission and evasion detection, FRE foundation triggers, chart contradictions, and literature hits. The entire round trip lands in roughly 4 seconds — achieved by batching all 12 analyses into one Claude call rather than 12 sequential calls, which would have added roughly 18 seconds of latency at Haiku's average response time.

First, dense testimony segments routinely required 6,000–8,000 output tokens, but Haiku-4-5's default response budget is 4,096 tokens.

Three production failures shaped the final design. First, dense testimony segments routinely required 6,000–8,000 output tokens, but Haiku-4-5's default response budget is 4,096 tokens. Truncated output surfaced as JSON parse errors rather than a clear `max_tokens` signal, costing hours of debugging before the team raised `max_tokens` to 8192. Second, Railway's Cloudflare layer closes idle WebSocket connections after 100 seconds — easily exceeded when a witness pauses to review a document. The fix was a client-side ping every 25 seconds using a `useEffect` interval, with the server echoing a simple pong. Third, early prompts asked Claude to generate PubMed citations with PMIDs and author names to counter "in my experience" expert claims; the citations were convincing but largely fabricated. Invoking the Mata v. Avianca precedent as a cautionary parallel, Mahoney replaced this with a workflow where Claude generates only a PubMed search query, which is then executed against the real NCBI API.

Key facts

01Live deposition transcripts run 12K–25K words over two hours and are processed via Deepgram Nova-3 streaming ASR.
02Claude Haiku-4-5 returns a single 12-key JSON schema per testimony segment covering medical accuracy, Daubert vulnerabilities, prior-testimony contradictions, evasion patterns, FRE triggers, and more.
03Batching all analysis into one Claude call achieves ~4-second latency; 12 sequential calls would have taken ~18 seconds.
04Haiku-4-5's default 4,096-token response budget caused silent JSON truncation on dense segments; raising `max_tokens` to 8192 fixed it.
05Cloudflare's 100-second idle WebSocket timeout was solved with a client-side ping every 25 seconds.
06Claude-generated PubMed citations were almost entirely fabricated; the fix was having Claude produce only a search query executed against the real NCBI API.
07The frontend is a single 232 KB React + Vite bundle; the backend is one Node.js server hosted on Railway.

Topics

#tool-use #prompt-engineering #real-time-analysis #json-schema #production-deployment

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 21, 2026 · 18:16 UTC. How this works →

Apr 21, 2026·1 min readApplications & Use Cases

Real-time AI deposition analysis with Deepgram and Claude

John Mahoney details how Courtroom AI uses Deepgram Nova-3 and Claude Haiku-4-5 to deliver structured legal analysis of live deposition testimony to attorneys in roughly 4 seconds per segment.

Dev.to #claude·John Mahoney

Read at source

Composite

5.2

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Live deposition transcripts run 12K–25K words over two hours and are processed via Deepgram Nova-3 streaming ASR.
02Claude Haiku-4-5 returns a single 12-key JSON schema per testimony segment covering medical accuracy, Daubert vulnerabilities, prior-testimony contradictions, evasion patterns, FRE triggers, and more.
03Batching all analysis into one Claude call achieves ~4-second latency; 12 sequential calls would have taken ~18 seconds.

Summary— our read of the original

First, dense testimony segments routinely required 6,000–8,000 output tokens, but Haiku-4-5's default response budget is 4,096 tokens.

Key facts

01Live deposition transcripts run 12K–25K words over two hours and are processed via Deepgram Nova-3 streaming ASR.
02Claude Haiku-4-5 returns a single 12-key JSON schema per testimony segment covering medical accuracy, Daubert vulnerabilities, prior-testimony contradictions, evasion patterns, FRE triggers, and more.
03Batching all analysis into one Claude call achieves ~4-second latency; 12 sequential calls would have taken ~18 seconds.
04Haiku-4-5's default 4,096-token response budget caused silent JSON truncation on dense segments; raising `max_tokens` to 8192 fixed it.
05Cloudflare's 100-second idle WebSocket timeout was solved with a client-side ping every 25 seconds.
06Claude-generated PubMed citations were almost entirely fabricated; the fix was having Claude produce only a search query executed against the real NCBI API.
07The frontend is a single 232 KB React + Vite bundle; the backend is one Node.js server hosted on Railway.

Topics

#tool-use #prompt-engineering #real-time-analysis #json-schema #production-deployment

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics