OpenAI launches GPT-Image-2 with top Arena leaderboard scores
OpenAI released GPT-Image-2 across ChatGPT, Codex, and its API, claiming the #1 spot on all Image Arena leaderboards with a +242 Elo lead on text-to-image over the next competitor.
Score breakdown
Developers building agentic coding pipelines should note that GPT-Image-2's strong UI mockup and diagram generation makes it a practical front-end for code agents like Codex — generate a visual spec, then let an agent implement it.
- 01GPT-Image-2 launched across ChatGPT, Codex, and the API with both thinking and non-thinking variants.
- 02Arena ranks GPT-Image-2 #1 across all Image Arena leaderboards: 1512 text-to-image, 1513 single-image edit, 1464 multi-image edit.
- 03GPT-Image-2 holds a +242 Elo lead over the next model on text-to-image.
OpenAI launched GPT-Image-2, making it available across ChatGPT, Codex, and the API with both thinking and non-thinking variants. The model had previously appeared as a stealth entry on Arena before its official release. Key capabilities highlighted include stronger text rendering, layout fidelity, editing, multilingual support, and a "thinking" mode for images. When paired with a thinking model, GPT-Image-2 can search the web, generate multiple candidates, self-check outputs, and produce artifacts such as slides, infographics, diagrams, UI mockups, and QR codes. Downstream integrations at launch include Figma, Canva, Firefly, fal, and Hermes Agent.
Independent reactions characterized it as a more usable model for UI mockups, documentation, productivity visuals, and reference-driven design — not merely a prettier art generator.
Arena benchmarks show GPT-Image-2 at #1 across all Image Arena leaderboards, with scores of 1512 on text-to-image, 1513 on single-image edit, and 1464 on multi-image edit, and a striking +242 Elo lead over the next model on text-to-image. Independent reactions characterized it as a more usable model for UI mockups, documentation, productivity visuals, and reference-driven design — not merely a prettier art generator. One particularly noted systems implication is that image generation is becoming a front-end for coding agents: a UI spec generated as an image can serve as a visual reference for Codex or another code agent to implement against.
The article also covers Hugging Face's release of `ml-intern`, an open-source agent that automates the post-training research loop — reading papers, collecting datasets, launching training jobs, evaluating runs, and iterating on failures. Reported results include improving GPQA scientific reasoning from 10% to 32% in under 10 hours on Qwen3-1.7B, and a healthcare setup that reportedly beat Codex on HealthBench by 60%. Separately, `DSPy 3.2` shipped with RLM improvements, optimizer chaining, and LiteLLM decoupling, reflecting a broader trend of agent runtime harnesses becoming first-class engineering artifacts.
Key facts
- 01GPT-Image-2 launched across ChatGPT, Codex, and the API with both thinking and non-thinking variants.
- 02Arena ranks GPT-Image-2 #1 across all Image Arena leaderboards: 1512 text-to-image, 1513 single-image edit, 1464 multi-image edit.
- 03GPT-Image-2 holds a +242 Elo lead over the next model on text-to-image.
- 04Downstream integrations at launch include Figma, Canva, Firefly, fal, and Hermes Agent.
- 05Hugging Face released `ml-intern`, an open-source agent automating the post-training research loop end-to-end.