Apr 21, 2026·1 min readApplications & Use Cases

ChatGPT Images 2.0 tested against Gemini and gpt-image-1

Simon Willison benchmarks OpenAI's newly released `gpt-image-2` against `gpt-image-1` and Google's Nano Banana 2 using a "Where's Waldo"-style raccoon-with-ham-radio prompt, concluding that `gpt-image-2` at high quality takes the lead.

Simon Willison (main)

Read at source

Composite

5.5

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

Developers evaluating image generation APIs should note that `gpt-image-2`'s quality gains are most apparent at maximum resolution settings, but those settings carry meaningful per-image costs that need to be factored into production budgets.

01OpenAI released `gpt-image-2` on April 21, 2026.
02Sam Altman claimed the leap from `gpt-image-1` to `gpt-image-2` equals jumping from GPT-3 to GPT-5.
03`gpt-image-1` and Claude Opus 4.7 both failed to locate the hidden raccoon in the test image.

Summary— our read of the original

OpenAI released `gpt-image-2` on April 21, 2026. On the accompanying livestream, Sam Altman described the jump from `gpt-image-1` to `gpt-image-2` as equivalent to the leap from GPT-3 to GPT-5. To test this claim, the post uses a "Where's Waldo"-style prompt — specifically, an image where a raccoon holding a ham radio is hidden in a busy scene — as a stress test for complex illustration quality and text rendering.

The baseline `gpt-image-1` result failed to produce a findable raccoon; Claude Opus 4.7, even with its higher-resolution inputs, could not definitively locate one either.

The baseline `gpt-image-1` result failed to produce a findable raccoon; Claude Opus 4.7, even with its higher-resolution inputs, could not definitively locate one either. Google's Nano Banana 2 via Gemini fared better, placing the raccoon prominently in an "Amateur Radio Club" booth with a "W6HAM" callsign pun — easy to spot but arguably too easy. Nano Banana Pro in AI Studio produced the worst result of any model tested. For `gpt-image-2`, a default-quality run also failed to include a visible raccoon, but cranking the settings to `outputQuality: high` and `3840x2160` (described as the maximum resolution) yielded a 17MB PNG — converted to a 5MB WEBP — with a clearly visible raccoon in the bottom left. That image consumed 13,342 output tokens at $30/million, totaling roughly 40 cents.

The post concludes that `gpt-image-2` at high quality takes the crown from Gemini for complex image generation, at least for now. A follow-up note cautions that asking models to solve their own puzzles is unreliable: a Hacker News user prompted ChatGPT to circle the raccoon in one of the images, and the model circled the wrong location, demonstrating that these models cannot be trusted to accurately identify elements in their own generated images.

Key facts

01OpenAI released `gpt-image-2` on April 21, 2026.
02Sam Altman claimed the leap from `gpt-image-1` to `gpt-image-2` equals jumping from GPT-3 to GPT-5.
03`gpt-image-1` and Claude Opus 4.7 both failed to locate the hidden raccoon in the test image.
04Google's Nano Banana 2 placed the raccoon visibly in an 'Amateur Radio Club' booth with a 'W6HAM' callsign pun.
05`gpt-image-2` at `outputQuality: high` and `3840x2160` resolution produced a clearly visible raccoon in the bottom left.
06The high-quality `gpt-image-2` image used 13,342 output tokens at $30/million, costing approximately 40 cents.
07Models cannot reliably identify elements in their own generated images — ChatGPT circled the wrong location when asked to find the raccoon.

Topics

#model-release #image-generation #benchmarks #comparative-analysis #prompt-engineering

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Apr 22, 2026 · 11:07 UTC. How this works →

Apr 21, 2026·1 min readApplications & Use Cases

ChatGPT Images 2.0 tested against Gemini and gpt-image-1

Simon Willison (main)

Read at source

Composite

5.5

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01OpenAI released `gpt-image-2` on April 21, 2026.
02Sam Altman claimed the leap from `gpt-image-1` to `gpt-image-2` equals jumping from GPT-3 to GPT-5.
03`gpt-image-1` and Claude Opus 4.7 both failed to locate the hidden raccoon in the test image.

Summary— our read of the original

The baseline `gpt-image-1` result failed to produce a findable raccoon; Claude Opus 4.7, even with its higher-resolution inputs, could not definitively locate one either.

Key facts

01OpenAI released `gpt-image-2` on April 21, 2026.
02Sam Altman claimed the leap from `gpt-image-1` to `gpt-image-2` equals jumping from GPT-3 to GPT-5.
03`gpt-image-1` and Claude Opus 4.7 both failed to locate the hidden raccoon in the test image.
04Google's Nano Banana 2 placed the raccoon visibly in an 'Amateur Radio Club' booth with a 'W6HAM' callsign pun.
05`gpt-image-2` at `outputQuality: high` and `3840x2160` resolution produced a clearly visible raccoon in the bottom left.
06The high-quality `gpt-image-2` image used 13,342 output tokens at $30/million, costing approximately 40 cents.
07Models cannot reliably identify elements in their own generated images — ChatGPT circled the wrong location when asked to find the raccoon.

Topics

#model-release #image-generation #benchmarks #comparative-analysis #prompt-engineering

Methodology

Score breakdown

Key facts

Topics

Score breakdown

Key facts

Topics