★ Rank 11 today·NEW·Jun 12, 2026·1 min readResearch Papers

Diffusion Gemma is 4x faster but hallucinates 6x more facts

A benchmark on a single H100 (FP8) found DiffusionGemma 26B A4B generates tokens at 763 tok/s — roughly 4x faster than Gemma4 26B A4B's 218 tok/s — but produced 28 factual errors versus Gemma4's 5, a roughly 6x higher mistake rate.

r/LocalLLaMA·u/gladkos

Read at source

Composite · rank 11

5.5

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

DiffusionGemma's parallel token-generation architecture produces fluent but factually unreliable text, with error rates that grow as topics become more obscure — a concrete limitation that distinguishes it from its autoregressive counterpart for any fact-sensitive use case.

01Benchmark run on a single H100 in FP8 precision, comparing DiffusionGemma 26B A4B vs. Gemma4 26B A4B
02Gemma4: 218 tok/s, 15.1s total, 45 facts correct, 5 mistakes
03DiffusionGemma: 763 tok/s, 3.7s total, 33 facts correct, 28 mistakes

Summary— our read of the original

u/gladkos ran a head-to-head benchmark between DiffusionGemma 26B A4B and Gemma4 26B A4B on a single H100 in FP8 precision. Both models were given three identical writing tasks — a Steve Jobs biography, the history of Tetris, and the story of BeOS — chosen deliberately in order of decreasing topic popularity. Every factual claim in every response was then manually verified.

Specific hallucinations included naming "Clara Clley" as Steve Jobs' mother, inventing a Tetris colleague for Pajitnov named "Geri Gulovik," and pricing the BeBox at $9,999 when the real price was $1,600.

Gemma4 produced 45 correct facts and only 5 errors. DiffusionGemma produced 33 correct facts and 28 errors, with the error rate climbing sharply as topics became more obscure: 4 mistakes on Jobs, 12 on Tetris, and 12 on BeOS. Specific hallucinations included naming "Clara Clley" as Steve Jobs' mother, inventing a Tetris colleague for Pajitnov named "Geri Gulovik," and pricing the BeBox at $9,999 when the real price was $1,600. On throughput, DiffusionGemma ran at 763 tok/s and completed its outputs in 3.7 seconds total, compared to Gemma4's 218 tok/s and 15.1 seconds.

The post explains the accuracy gap through the models' differing generation strategies: DiffusionGemma places 256 tokens on screen simultaneously and iteratively polishes them for fluency, meaning a fabricated name or number that sounds plausible survives the refinement passes unchanged. Autoregressive Gemma4, by contrast, generates one token at a time and conditions each new token on all prior context. The post also notes that Google's own launch post acknowledges the quality trade-off and recommends using regular Gemma 4 when factual accuracy matters.

Key facts

01Benchmark run on a single H100 in FP8 precision, comparing DiffusionGemma 26B A4B vs. Gemma4 26B A4B
02Gemma4: 218 tok/s, 15.1s total, 45 facts correct, 5 mistakes
03DiffusionGemma: 763 tok/s, 3.7s total, 33 facts correct, 28 mistakes
04DiffusionGemma errors increased with topic obscurity: 4 mistakes on Jobs, 12 on Tetris, 12 on BeOS
05Hallucinations included naming 'Clara Clley' as Jobs' mother, inventing a colleague 'Geri Gulovik' for Pajitnov, and pricing the BeBox at $9,999 (real price: $1,600)
06DiffusionGemma generates 256 tokens simultaneously and refines for fluency, not factual accuracy
07Google's own launch post acknowledges the quality trade-off and recommends regular Gemma 4 when facts matter

Topics

#benchmarks #model-release #reasoning #open-source

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 14, 2026 · 09:08 UTC. How this works →

★ Rank 11 today·NEW·Jun 12, 2026·1 min readResearch Papers

Diffusion Gemma is 4x faster but hallucinates 6x more facts

r/LocalLLaMA·u/gladkos

Read at source

Composite · rank 11

5.5

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Benchmark run on a single H100 in FP8 precision, comparing DiffusionGemma 26B A4B vs. Gemma4 26B A4B
02Gemma4: 218 tok/s, 15.1s total, 45 facts correct, 5 mistakes
03DiffusionGemma: 763 tok/s, 3.7s total, 33 facts correct, 28 mistakes

Summary— our read of the original

Specific hallucinations included naming "Clara Clley" as Steve Jobs' mother, inventing a Tetris colleague for Pajitnov named "Geri Gulovik," and pricing the BeBox at $9,999 when the real price was $1,600.

Key facts

01Benchmark run on a single H100 in FP8 precision, comparing DiffusionGemma 26B A4B vs. Gemma4 26B A4B
02Gemma4: 218 tok/s, 15.1s total, 45 facts correct, 5 mistakes
03DiffusionGemma: 763 tok/s, 3.7s total, 33 facts correct, 28 mistakes
04DiffusionGemma errors increased with topic obscurity: 4 mistakes on Jobs, 12 on Tetris, 12 on BeOS
05Hallucinations included naming 'Clara Clley' as Jobs' mother, inventing a colleague 'Geri Gulovik' for Pajitnov, and pricing the BeBox at $9,999 (real price: $1,600)
06DiffusionGemma generates 256 tokens simultaneously and refines for fluency, not factual accuracy
07Google's own launch post acknowledges the quality trade-off and recommends regular Gemma 4 when facts matter

Topics

#benchmarks #model-release #reasoning #open-source

Methodology

Score breakdown

Key facts

Topics

More in Research Papers.

Score breakdown

Key facts

Topics

More in Research Papers.