DragOn benchmark targets drag-based GUI interaction gap
Researchers Nathan Bout, Maxime Langevin, and Ronan Riochet introduce DragOn, a benchmark and training dataset of 286K screenshots and 3.5M tasks covering drag-based GUI interactions like text highlighting, cell selection, element resizing, and slider manipulation.
Score breakdown
Fine-tuning on DragOn's 3.5M drag-grounding tasks offers a concrete path to improving GUI agent accuracy on complex interactions — like resizing, highlighting, and slider control — that current models handle poorly.
- 01DragOn is a drag grounding benchmark and training dataset for GUI agents introduced by Nathan Bout, Maxime Langevin, and Ronan Riochet.
- 02Drag-grounding data is currently an order of magnitude smaller than click-grounding data, limiting model performance.
- 03The dataset covers four domains: text highlighting, cell selection, element resizing, and slider manipulation.
Nathan Bout, Maxime Langevin, and Ronan Riochet present DragOn, a benchmark and dataset designed to close the data gap between click-grounding and drag-grounding for GUI agents — vision-based models that control desktops, web browsers, and mobile devices. While click-grounding has been propelled by million-scale datasets, drag-grounding data has lagged by an order of magnitude, leaving current models underperforming on complex drag-based interactions such as drag-and-drop, swipe, and highlight operations.
DragOn spans four interaction domains: text highlighting, cell selection, element resizing, and slider manipulation.
DragOn spans four interaction domains: text highlighting, cell selection, element resizing, and slider manipulation. The dataset includes 286K training screenshots and 3.5M training tasks, alongside a 2,000-example held-out evaluation suite for rigorous benchmarking. The authors evaluate a range of models — proprietary systems (GPT, Claude) and open-weight alternatives (Qwen, Kimi, Holo) — as well as a Qwen VLM fine-tuned specifically on the DragOn training data. Their results indicate that training on DragOn data has the potential to improve model performance on downstream computer-use tasks.
Key facts
- 01DragOn is a drag grounding benchmark and training dataset for GUI agents introduced by Nathan Bout, Maxime Langevin, and Ronan Riochet.
- 02Drag-grounding data is currently an order of magnitude smaller than click-grounding data, limiting model performance.
- 03The dataset covers four domains: text highlighting, cell selection, element resizing, and slider manipulation.
- 04DragOn comprises 286K training screenshots and 3.5M training tasks.
- 05A 2,000-example held-out evaluation suite is included for benchmarking.
- 06Models evaluated include proprietary (GPT, Claude) and open-weight (Qwen, Kimi, Holo) systems, plus a fine-tuned Qwen VLM.
- 07Results suggest the dataset could improve state-of-the-art model performance on downstream computer-use tasks.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 7, 2026 · 12:45 UTC. How this works →