★ Rank 23 today·NEW·Jun 17, 2026·1 min readOpen Source

Gemma 4 E2B hits 255 tok/s in-browser via Fable 5-optimized WebGPU kernels

A WebGPU demo of Gemma 4 E2B running in-browser at 255 tokens per second on an M4 Max has been released, using kernels optimized with the now-shutdown Fable 5.

r/LocalLLaMA·u/xenovatech

Read at source

Composite · rank 23

5.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

The release demonstrates that Fable 5's kernel optimization work produced a publicly reusable artifact — in-browser WebGPU kernels capable of ~255 tok/s on Gemma 4 E2B — before the tool was shut down.

01Gemma 4 E2B runs in-browser at ~255 tokens per second on an M4 Max using WebGPU kernels.
02The WebGPU kernels were optimized with Fable 5 before it was shut down.
03The demo and kernels are publicly available on Hugging Face Spaces at `webml-community/gemma-4-webgpu-kernels`.

Summary— our read of the original

u/xenovatech announced the public release of a WebGPU-based in-browser demo running Gemma 4 E2B at approximately 255 tokens per second on an M4 Max. The WebGPU kernels were optimized with the assistance of Fable 5 prior to that tool's shutdown, and the post credits Fable 5 directly for the performance gains achieved.

The demo and kernels are hosted on Hugging Face Spaces at `webml-community/gemma-4-webgpu-kernels`, and the model used is `google/gemma-4-E2B-it-qat-mobile-transformers`, also available on Hugging Face. Both resources are open for the public to try.

Key facts

01Gemma 4 E2B runs in-browser at ~255 tokens per second on an M4 Max using WebGPU kernels.
02The WebGPU kernels were optimized with Fable 5 before it was shut down.
03The demo and kernels are publicly available on Hugging Face Spaces at `webml-community/gemma-4-webgpu-kernels`.
04The model used is `google/gemma-4-E2B-it-qat-mobile-transformers`, hosted on Hugging Face.
05The release was posted by u/xenovatech on r/LocalLLaMA.

Topics

#webgpu #gemma-4 #open-source #in-browser #optimization

Methodology

Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 18, 2026 · 10:40 UTC. How this works →

★ Rank 23 today·NEW·Jun 17, 2026·1 min readOpen Source

Gemma 4 E2B hits 255 tok/s in-browser via Fable 5-optimized WebGPU kernels

A WebGPU demo of Gemma 4 E2B running in-browser at 255 tokens per second on an M4 Max has been released, using kernels optimized with the now-shutdown Fable 5.

r/LocalLLaMA·u/xenovatech

Read at source

Composite · rank 23

5.8

out of 10

Novelty · 25%

Novelty

Impact · 43%

Impact

Credibility · 12%

Credibility

Depth · 20%

Depth

Weights applied. How scores work ↗

Why it matters

01Gemma 4 E2B runs in-browser at ~255 tokens per second on an M4 Max using WebGPU kernels.
02The WebGPU kernels were optimized with Fable 5 before it was shut down.
03The demo and kernels are publicly available on Hugging Face Spaces at `webml-community/gemma-4-webgpu-kernels`.

Summary— our read of the original

Key facts

01Gemma 4 E2B runs in-browser at ~255 tokens per second on an M4 Max using WebGPU kernels.
02The WebGPU kernels were optimized with Fable 5 before it was shut down.
03The demo and kernels are publicly available on Hugging Face Spaces at `webml-community/gemma-4-webgpu-kernels`.
04The model used is `google/gemma-4-E2B-it-qat-mobile-transformers`, hosted on Hugging Face.
05The release was posted by u/xenovatech on r/LocalLLaMA.

Topics

#webgpu #gemma-4 #open-source #in-browser #optimization

Methodology

Score breakdown

Key facts

Topics

More in Open Source.

Score breakdown

Key facts

Topics

More in Open Source.