Demystifying WebGL for ML Engineers

If you've used TensorFlow.js, you've probably noticed that it's surprisingly fast. Way faster than you'd expect for JavaScript running in a browser. The reason is WebGL, and understanding how it works will make you a better ML engineer on the web.

I went down this rabbit hole because I needed to optimize some browser-based inference (I've been running super-resolution models in the browser) and realized I had no idea what was actually happening under the hood. Here's what I found.

What Is WebGL?

WebGL is a JavaScript API that gives your browser access to the GPU. It was designed for rendering 3D graphics: games, visualizations, interactive scenes. But clever people realized that if you can run shader programs on the GPU, you can run matrix math on the GPU. And if you can run matrix math, you can run neural networks.

How TensorFlow.js Uses WebGL

TensorFlow.js doesn't use WebGL for graphics. It uses it for general-purpose GPU computing (GPGPU). Here's the trick:

Tensor data is stored as textures. A 2D texture in WebGL is basically a 2D array of RGBA values. TF.js encodes tensor values into these textures.
Operations are fragment shaders. Each layer of your neural network becomes a shader program. Convolutions, matrix multiplications, activations, all implemented as GLSL shaders that process texture data.
The GPU runs them in parallel. Fragment shaders execute independently per pixel, giving you massive parallelism for free.

Why This Is Fast

A simple example: multiplying two 1024x1024 matrices.

CPU (JavaScript): Process each multiplication sequentially. Even with WASM optimizations, you're looking at ~1 billion operations on a few cores.
GPU (WebGL): Each output element is computed by a separate shader invocation. The GPU has hundreds of cores running in parallel.

The speedup is typically 10-100x over the CPU backend, depending on the operation and GPU.

The Limitations

WebGL wasn't designed for ML, so there are some rough edges:

No true compute shaders. WebGL 2.0 still uses the graphics pipeline. You're encoding compute into rendering operations, which adds overhead. WebGPU (coming eventually) will fix this with proper compute shaders.

Texture size limits. GPUs have maximum texture dimensions (usually 4096x4096 or 16384x16384). Large tensors need to be tiled across multiple textures.

Precision. WebGL typically works in FP16 or mediump. If your model is sensitive to floating point precision, you might see accuracy drops compared to FP32 on CPU.

Data transfer overhead. Moving data between CPU (JavaScript) and GPU (WebGL textures) is expensive. The key is to keep as much as possible on the GPU and minimize readbacks.

Practical Tips

After spending months working with TF.js and WebGL:

Batch operations. Don't call model.predict() for each frame independently. Batch inputs when possible to amortize the overhead.

Warm up the model. The first inference is always slow because shaders need to compile. Run a dummy prediction on load.

const dummy = tf.zeros([1, 224, 224, 3]);
await model.predict(dummy).data();
dummy.dispose();

Dispose tensors. Memory leaks are the #1 issue in TF.js apps. WebGL textures don't get garbage collected automatically.

// Bad: leaks memory every frame
const result = model.predict(input);
 
// Good: clean up after yourself
const result = model.predict(input);
const data = await result.data();
result.dispose();
input.dispose();

Profile with tf.profile(). TF.js has built-in profiling that shows you where time is spent.

const profileInfo = await tf.profile(() => {
  model.predict(input);
});
console.log(profileInfo);

What's Coming

WebGPU is the next generation API that will replace WebGL for compute workloads. It provides proper compute shaders, better memory management, and direct access to modern GPU features. TF.js already has an experimental WebGPU backend.

When WebGPU lands in all browsers, browser-based ML is going to get another significant speedup. If you want to see how all of this comes together in practice, I wrote a full guide on deploying ML models to the browser with TensorFlow.js, covering everything from model conversion to memory management.

Demystifying WebGL for ML Engineers

What Is WebGL?

How TensorFlow.js Uses WebGL

Why This Is Fast

The Limitations

Practical Tips

What's Coming

Related Posts

Running Super-Resolution in the Browser with TensorFlow.js

From Python to Production: Deploying ML Models with TensorFlow.js

Transformers for Image Enhancement: Beyond Classification