ChatGPT Images 2.0 tested against Gemini and gpt-image-1
Simon Willison benchmarks OpenAI's newly released `gpt-image-2` against `gpt-image-1` and Google's Nano Banana 2 using a "Where's Waldo"-style raccoon-with-ham-radio prompt, concluding that `gpt-image-2` at high quality takes the lead.
Score breakdown
Developers evaluating image generation APIs should note that `gpt-image-2`'s quality gains are most apparent at maximum resolution settings, but those settings carry meaningful per-image costs that need to be factored into production budgets.
- 01OpenAI released `gpt-image-2` on April 21, 2026.
- 02Sam Altman claimed the leap from `gpt-image-1` to `gpt-image-2` equals jumping from GPT-3 to GPT-5.
- 03`gpt-image-1` and Claude Opus 4.7 both failed to locate the hidden raccoon in the test image.
OpenAI released `gpt-image-2` on April 21, 2026. On the accompanying livestream, Sam Altman described the jump from `gpt-image-1` to `gpt-image-2` as equivalent to the leap from GPT-3 to GPT-5. To test this claim, the post uses a "Where's Waldo"-style prompt — specifically, an image where a raccoon holding a ham radio is hidden in a busy scene — as a stress test for complex illustration quality and text rendering.
The baseline `gpt-image-1` result failed to produce a findable raccoon; Claude Opus 4.7, even with its higher-resolution inputs, could not definitively locate one either.
The baseline `gpt-image-1` result failed to produce a findable raccoon; Claude Opus 4.7, even with its higher-resolution inputs, could not definitively locate one either. Google's Nano Banana 2 via Gemini fared better, placing the raccoon prominently in an "Amateur Radio Club" booth with a "W6HAM" callsign pun — easy to spot but arguably too easy. Nano Banana Pro in AI Studio produced the worst result of any model tested. For `gpt-image-2`, a default-quality run also failed to include a visible raccoon, but cranking the settings to `outputQuality: high` and `3840x2160` (described as the maximum resolution) yielded a 17MB PNG — converted to a 5MB WEBP — with a clearly visible raccoon in the bottom left. That image consumed 13,342 output tokens at $30/million, totaling roughly 40 cents.
The post concludes that `gpt-image-2` at high quality takes the crown from Gemini for complex image generation, at least for now. A follow-up note cautions that asking models to solve their own puzzles is unreliable: a Hacker News user prompted ChatGPT to circle the raccoon in one of the images, and the model circled the wrong location, demonstrating that these models cannot be trusted to accurately identify elements in their own generated images.
Key facts
- 01OpenAI released `gpt-image-2` on April 21, 2026.
- 02Sam Altman claimed the leap from `gpt-image-1` to `gpt-image-2` equals jumping from GPT-3 to GPT-5.
- 03`gpt-image-1` and Claude Opus 4.7 both failed to locate the hidden raccoon in the test image.
- 04Google's Nano Banana 2 placed the raccoon visibly in an 'Amateur Radio Club' booth with a 'W6HAM' callsign pun.
- 05`gpt-image-2` at `outputQuality: high` and `3840x2160` resolution produced a clearly visible raccoon in the bottom left.
- 06The high-quality `gpt-image-2` image used 13,342 output tokens at $30/million, costing approximately 40 cents.