OpenAI launches GPT-Image-2 with top Arena leaderboard scores
OpenAI released GPT-Image-2 across ChatGPT, Codex, and its API, claiming the #1 spot on all Image Arena leaderboards with a +242 Elo lead on text-to-image over the next competitor.
Score breakdown
Developers building agentic coding pipelines should evaluate GPT-Image-2 as a front-end for visual spec generation — producing UI mockups or diagrams that downstream agents like Codex can implement directly.
- 01GPT-Image-2 launched across ChatGPT, Codex, and the API with both thinking and non-thinking variants.
- 02Arena ranks GPT-Image-2 #1 across all Image Arena leaderboards: 1512 text-to-image, 1513 single-image edit, 1464 multi-image edit.
- 03GPT-Image-2 holds a +242 Elo lead on text-to-image over the next model on the Arena leaderboard.
OpenAI launched GPT-Image-2 — available on ChatGPT, Codex, and the API — with both thinking and non-thinking variants. The model emphasizes stronger text rendering, layout fidelity, editing, multilingual support, and "thinking" for images. When paired with a thinking model, it can search the web, generate multiple candidates, self-check its own outputs, and produce structured artifacts such as slides, infographics, diagrams, UI mockups, and QR codes. Downstream integrations are already live from Figma, Canva, Firefly, fal, and Hermes Agent. The article notes the launch is particularly notable given a reported "focus" sprint that involved the shutdown and departure of the Sora team, making image generation's continued priority at OpenAI both heartening and surprising.
Independent reactions highlighted that the model is not merely better at aesthetics, but more practically useful for UI mockups, documentation, productivity visuals, and reference-driven design.
Arena benchmarks show a significant performance jump: GPT-Image-2 holds the #1 position across all Image Arena leaderboards, with scores of 1512 on text-to-image, 1513 on single-image edit, and 1464 on multi-image edit — including a striking +242 Elo lead on text-to-image over the next model. Independent reactions highlighted that the model is not merely better at aesthetics, but more practically useful for UI mockups, documentation, productivity visuals, and reference-driven design. A key systems-level implication noted is that image generation is becoming a front-end for coding agents: generate a UI spec as an image, then have Codex or another code agent implement against that visual reference.
The roundup also covers Hugging Face's `ml-intern`, an open-source agent that automates the post-training research loop — reading papers, following citation graphs, collecting datasets, launching training jobs, evaluating runs, and iterating on failures. Reported results include GPQA scientific reasoning improving from 10% to 32% in under 10 hours on `Qwen3-1.7B`, and a healthcare setup that reportedly beat Codex on HealthBench by 60%. Separately, Cursor's reported $10B contract with xAI and a right to acquire for $60B is mentioned as a major financial story of the day, and `DSPy 3.2` shipped with RLM improvements, optimizer chaining, and LiteLLM decoupling.
Key facts
- 01GPT-Image-2 launched across ChatGPT, Codex, and the API with both thinking and non-thinking variants.
- 02Arena ranks GPT-Image-2 #1 across all Image Arena leaderboards: 1512 text-to-image, 1513 single-image edit, 1464 multi-image edit.
- 03GPT-Image-2 holds a +242 Elo lead on text-to-image over the next model on the Arena leaderboard.