WebChallenger web agent rivals proprietary systems using open-weight models
WebChallenger is a web agent framework that achieves near-frontier benchmark performance using off-the-shelf open-weight models by addressing three architectural gaps rather than scaling model size.
Score breakdown
WebChallenger demonstrates that near-frontier web agent performance is achievable with open-weight models at a fraction of the inference cost of proprietary reasoning systems, by addressing architectural gaps rather than scaling model size.
- 01WebChallenger is authored by Jayoo Hwang, Xiaowen Zhang, and Vedant Padwal.
- 02The framework is built around PageMem, a structured page representation deterministically constructed from the DOM as a hierarchy of semantic sections.
- 03Three architectural mechanisms mirror human cognitive advantages: selective attention, persistent website memory, and compound action workflows.
Jayoo Hwang, Xiaowen Zhang, and Vedant Padwal argue that the performance gap between open-weight and proprietary LLM-based web agents stems not from insufficient model capability but from architectural shortcomings. Specifically, existing generalist agents fail to replicate three human cognitive advantages: selective attention to relevant page regions, persistent memory of website structure, and procedural fluency with common interaction patterns. WebChallenger addresses each gap through architecture design rather than model scale.
Because all three mechanisms operate over PageMem, the framework generalizes across websites without requiring site-specific adapters.
The framework's foundation is PageMem, a structured page representation built deterministically from the DOM that exposes each page as a hierarchy of semantic sections with short summaries. On top of this shared substrate, three mechanisms are constructed: a divide-and-conquer observation pipeline that lets the agent skim section summaries and drill into only task-relevant regions; a lightweight exploration and memory system that traverses each website once to build a reusable map of pages and element behaviors; and compound action workflows that collapse common multi-step interactions into single agent actions while automatically handling partial state changes. Because all three mechanisms operate over PageMem, the framework generalizes across websites without requiring site-specific adapters.
Using off-the-shelf open-weight models without any fine-tuning, WebChallenger achieves 56.3% on WebArena, 48.7% on VisualWebArena, 51.0% on Online-Mind2Web, and 70.9% on WorkArena — approaching frontier proprietary systems at a fraction of the inference cost. The code is publicly released at https://github.com/jayoohwang1/webchallenger.
Key facts
- 01WebChallenger is authored by Jayoo Hwang, Xiaowen Zhang, and Vedant Padwal.
- 02The framework is built around PageMem, a structured page representation deterministically constructed from the DOM as a hierarchy of semantic sections.
- 03Three architectural mechanisms mirror human cognitive advantages: selective attention, persistent website memory, and compound action workflows.
- 04The system uses off-the-shelf open-weight models with no fine-tuning.
- 05Benchmark scores: 56.3% on WebArena, 48.7% on VisualWebArena, 51.0% on Online-Mind2Web, and 70.9% on WorkArena.
- 06The framework generalizes across websites without site-specific adapters.
- 07Code is publicly released at https://github.com/jayoohwang1/webchallenger.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 10, 2026 · 15:34 UTC. How this works →