Unity

WebGPU LLM in Unity for NPC Interactions

An experiment with in-browser local inference using WebGPU has been integrated into a Unity game, where a large language model (LLM) serves as the NPCs' "brain" to drive decisions at interactive rates. Significant modifications were made to the WGSL kernels to reduce reliance on fp16 and support more operations for forward inference, with unexpected challenges in integrating with Unity due to Emscripten toolchain mismatches. While the WebGPU build offers a performance boost of 3x-10x over CPU depending on hardware, it remains about 10x less efficient than running directly on bare-metal hardware via CUDA. Optimizing WGSL kernels could help bridge this performance gap, and further exploration is needed to understand the limits of WebGPU performance. This matters because it highlights the potential and challenges of using WebGPU for efficient in-browser AI applications, which could revolutionize how interactive web experiences are developed.
Read Full Article
Read Full Article: WebGPU LLM in Unity for NPC Interactions

Posted on

Jan 6, 2026

by

NoiseReducer

in

Deep Dives, Tools

Topics: performance optimization, local inference, NPC interactions