[Math]

Gaussian Process
Regression

The Problem with Point Estimates

Most models give you a prediction. "Q3 revenue will be $2.4M." That's a claim without a calibration. It tells you nothing about what the model doesn't know whether the estimate comes from dense, reliable data or from extrapolation into a region the model has never seen.

A Gaussian Process gives you a prediction and an honest account of its own uncertainty. Where data is dense, the confidence band collapses. Where data is sparse, it expands. The uncertainty is a function of proximity to observations not a static disclaimer, but a geometric property of the model.

What a Gaussian Process Is

A Gaussian Process is a distribution over functions. Not a specific function a distribution over all possible functions, parameterized by a mean function and a covariance kernel. For any finite set of input points, the GP says the corresponding function values are jointly Gaussian.

When observations arrive, Bayes' rule updates the prior into a posterior a new distribution over functions that is consistent with the data. The posterior mean is the prediction. The posterior variance is the uncertainty. Both are computed analytically, without optimization or sampling.

The Kernel as Inductive Bias

The kernel function defines the GP's prior beliefs about the function's structure. The RBF (squared exponential) kernel assumes smooth, infinitely differentiable functions appropriate when the underlying process varies gradually. The Matérn kernel produces less smooth functions, closer to what real-world data looks like. The periodic kernel encodes the belief that the function repeats with a fixed period.

Choosing a kernel is an act of architectural reasoning: you are deciding what you believe about the world before seeing any data. Kernels can be composed a Matérn plus a periodic kernel models a smooth non-periodic trend with a seasonal component on top. The composition of beliefs is the composition of the kernels.

THE CENTRAL CLAIM

Uncertainty as output

A model that knows what it doesn't know is a different kind of tool. The confidence band isn't a hedge it's the answer. Where it's wide, you need more data. Where it's narrow, you can act.

Bayesian inference implemented from scratch in TypeScript

The Math

Given training observations and a kernel, the posterior mean and variance at test points are computed analytically. The key operation is inverting the kernel matrix a dense n×n matrix for n observations. Direct inversion is numerically unstable. The standard approach is Cholesky decomposition: factor the matrix into a lower triangular matrix L such that LLᵀ equals the kernel matrix, then solve two triangular systems instead of one dense inversion.

Cholesky is O(n³) but for the interactive case n ≤ 50 user-placed observations it runs in under 5ms in the browser with plain TypeScript. No WebAssembly, no linear algebra library. The full posterior recomputes on every observation change, which is what makes the real-time drag interaction possible.

The connection to neural networks: Neal (1996) showed that a neural network with a single hidden layer converges to a Gaussian Process as the number of hidden units goes to infinity. The GP is not a relic of pre-deep-learning statistics it is the theoretical object that wide neural networks approximate.

Fourier Decomposition

Gaussian ProcessRegression

The Problem with Point Estimates

What a Gaussian Process Is

The Kernel as Inductive Bias

The Math

Gaussian Process
Regression