Gemma 4 E2B hits 255 tok/s in-browser via Fable 5-optimized WebGPU kernels
A WebGPU demo of Gemma 4 E2B running in-browser at 255 tokens per second on an M4 Max has been released, using kernels optimized with the now-shutdown Fable 5.
Score breakdown
The release demonstrates that Fable 5's kernel optimization work produced a publicly reusable artifact — in-browser WebGPU kernels capable of ~255 tok/s on Gemma 4 E2B — before the tool was shut down.
- 01Gemma 4 E2B runs in-browser at ~255 tokens per second on an M4 Max using WebGPU kernels.
- 02The WebGPU kernels were optimized with Fable 5 before it was shut down.
- 03The demo and kernels are publicly available on Hugging Face Spaces at `webml-community/gemma-4-webgpu-kernels`.
u/xenovatech announced the public release of a WebGPU-based in-browser demo running Gemma 4 E2B at approximately 255 tokens per second on an M4 Max. The WebGPU kernels were optimized with the assistance of Fable 5 prior to that tool's shutdown, and the post credits Fable 5 directly for the performance gains achieved.
The demo and kernels are hosted on Hugging Face Spaces at `webml-community/gemma-4-webgpu-kernels`, and the model used is `google/gemma-4-E2B-it-qat-mobile-transformers`, also available on Hugging Face. Both resources are open for the public to try.
Key facts
- 01Gemma 4 E2B runs in-browser at ~255 tokens per second on an M4 Max using WebGPU kernels.
- 02The WebGPU kernels were optimized with Fable 5 before it was shut down.
- 03The demo and kernels are publicly available on Hugging Face Spaces at `webml-community/gemma-4-webgpu-kernels`.
- 04The model used is `google/gemma-4-E2B-it-qat-mobile-transformers`, hosted on Hugging Face.
- 05The release was posted by u/xenovatech on r/LocalLLaMA.
Topics
Summary and scoring are generated automatically from the original article. We always link back to the publisher and never republish images or paywalled content. Last processed Jun 18, 2026 · 10:40 UTC. How this works →