Every processed story in chronological order, with the newest coverage first. Filter by tag, source, or score to drill in.
The release makes Model Runner V2 the default for two of the most widely deployed model families (Llama and Mistral), bringing its performance improvements — including pipeline-parallel bubble elimination and breakable CUDA graphs — to a much broader set of deployments.
Sparse attention research bottlenecks slow both human researchers and AI coding agents — Vortex's programmable serving layer removes that friction, enabling faster automated exploration of attention algorithms for long-context LLM deployments.