Search for a command to run...
Every processed story in chronological order, with the newest coverage first. Filter by tag, source, or score to drill in.
Teams building production document-processing pipelines should evaluate cost-per-success and consistency metrics like `pass^5` rather than peak accuracy alone, as this benchmark shows budget and mid-range models can dramatically outperform expensive SOTA models on real business OCR tasks.
Teams building production OCR pipelines can use this benchmark to avoid overpaying for SOTA models — Gemini 3 Flash matches top-tier accuracy at a fraction of the cost, and the `pass^n` consistency metric helps identify models that are reliable enough for automated workflows.