Playwright + Claude Code pipeline automates tutorial video production
u/SpeedyBrowser45 built a fully automated tutorial video pipeline using Claude Code and Playwright that produces synced voice-over, cursor animations, UI annotations, background music, and a branded end card with zero manual editing.
Score breakdown
The pipeline replaces per-video manual recording and editing with a fully automated, code-driven workflow, reducing per-video cost to a few cents of TTS instead of a SaaS seat, and enabling language variants without re-recording.
- 01The pipeline uses Claude Code, Playwright, ElevenLabs (or Gemini TTS + OpenAI Whisper), and ffmpeg to produce tutorial videos with zero manual editing.
- 02Voice-over timestamps (word/character alignment) are used to sync UI annotations — highlights, cursor movements, labels — to the exact spoken word.
- 03Playwright's native `recordVideo` capability records the live app; a single colored frame at t=0 acts as an audio/video sync marker.
u/SpeedyBrowser45 shares a pipeline built to eliminate the manual effort of producing sales and tutorial videos for production web apps. The workflow has six stages, all orchestrated through Claude Code. First, Claude analyzes the target app pages and writes a single file containing the step sequence, voice-over narration, and a list of UI elements to annotate (buttons, cards, menus, KPIs). Second, Claude calls ElevenLabs — or alternatively Gemini TTS combined with OpenAI Whisper — to generate the voice-over with word- and character-level alignment timestamps, which are essential for syncing spoken words to on-screen actions. Third, Claude authors a Playwright script that drives the real app, rendering a moving cursor, border highlights, labels, and "Actions" menu openings at the correct moments. Fourth, Playwright records the session using its native `recordVideo` capability, with each annotation fired at its timestamp so highlights land on the exact word being spoken. A single colored sync-marker frame at `t=0` simplifies audio/video alignment. Fifth, Claude writes an `ffmpeg` command that overlays the voice-over, adds background music with sidechain compression ducked under narration, normalizes loudness, and appends a branded end card with a logo and CTA, producing a 1080p MP4.
Iteration passes are expected, as selectors and timing typically need adjustment.
The author notes several practical caveats and gotchas. Claude handles the production mechanics reliably, but the human still needs to direct which screens to feature, set the script's tone, and do a final watch-through — the author caught Claude writing Hindi narration in English word order. Iteration passes are expected, as selectors and timing typically need adjustment. Technical pitfalls include SPA auth tokens stored in `sessionStorage` dying on browser restart (fix: use a persistent profile with "Remember me" so tokens land in `localStorage`), `networkidle` never firing on long-polling SPAs (fix: use `domcontentloaded` plus URL waits with a capped timeout), and `ffmpeg drawtext` being unable to shape Devanagari or Arabic scripts (fix: keep on-screen text Latin and let the voice carry the language). The author wrapped the entire pipeline into a reusable Claude Code skill and subagent, so subsequent apps require only pointing it at the relevant screens.
Key facts
- 01The pipeline uses Claude Code, Playwright, ElevenLabs (or Gemini TTS + OpenAI Whisper), and ffmpeg to produce tutorial videos with zero manual editing.
- 02Voice-over timestamps (word/character alignment) are used to sync UI annotations — highlights, cursor movements, labels — to the exact spoken word.
- 03Playwright's native `recordVideo` capability records the live app; a single colored frame at t=0 acts as an audio/video sync marker.
- 04The ffmpeg step adds sidechain-compressed background music, loudness normalization, and a branded end card, outputting a 1080p MP4.
- 05Because the voice-over is decoupled from the recording, alternate-language versions require only translating the script and regenerating TTS — no re-shoot.
- 06Known gotchas: SPA auth in `sessionStorage` dies on browser restart; `networkidle` never fires on long-polling SPAs; `ffmpeg drawtext` cannot shape Devanagari or Arabic.
- 07The author wrapped the full workflow into a reusable Claude Code skill and subagent for reuse across future apps.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 9, 2026 · 17:05 UTC. How this works →