Why Does the Normal Distribution Work?

A simple question with a not-so-straightforward answer. Everyone learns about the infamous “bell curve” in one way or another—but why does randomly distributed data work this way? After all, we could imagine all kinds of wonky shapes for probability distributions. The strangeness of quantum chemistry even shows us that odd-looking probability distributions can occur “naturally.” Yet there’s something deeply intuitive about the bell shape.

For example, the normal distribution is consistent with the intuition that randomly distributed data should cluster symmetrically about a mean. Probability decays to zero as we move away from the mean in a kind of sigmoidal way: the drop is slow at first, picks up steam about halfway to the first standard deviation, and slows down again as p inches towards zero. Yet correspondence with our intuitions doesn’t give the normal distribution theoretical legitimacy: the hydrogenic 1s orbital has similar properties, after all. What’s so special about the normal distribution?

Although this is a question I’ve had for many years, I stumbled into the answer recently in an unexpected context: random diffusion. The answer gets right to the heart of what we mean by the word random, particularly with respect to the behavior of data (or little diffusing particles, in a diffusion context). If we imagine data points jiggling like little particles in a fluid, then random errors “nudge” the points to either the left or right with equal probability. What we really mean by “random” is that it’s impossible to predict which way the data points will move: they may go to the left or to the right with equal probability (50%). Randomly distributed data behaves just like little diffusing particles engaging in a random walk.

So we can imagine random errors as a huge collection of little unpredictable nudges in one direction or another. A whole bunch of these tiny nudges push most data points a little bit to the left or right of the mean, and a smaller number farther out. We can ask about the exact shape of the distribution of points; what we’re really asking about is the probability that a data point will land at a certain spot. Here’s where the paradigm of the random walk takes center stage: we can imagine the point taking a random walk, nudged one way or the other by random errors. A “spot” is just a number of nudges in one direction or the other (we can make the nudge size as big or small as we need it to be). The probability of k nudges to the left or right given N total nudges is

p(k) = \binom{N}{k}(0.5)^k(0.5)^{N-k} = \frac{N!}{(N-k)!k!}(0.5)^N

In words, this equation says that the probability is the number of ways to take k nudges one way from N nudges total times the probability of k nudges one way times the probability of – k nudges the other way (in math-speak, it’s a binomial distribution). The nudges are random, so the probabilities of individual nudges are all 0.5.

Plotting p versus k doesn’t quite give us the expected normal distribution for at least two reasons. First, p is written in terms of k nudges in one direction; the mean will thus sit at k = N/2, rather than zero (or wherever we want it to sit; zero is a simple choice). The solution is simply to move and re-label the axes so that x = 0 is the spot where k = N/2: x = k – N/2.

The second reason is in many ways more insidious: this distribution is confined to a finite part of the number line because k cannot exceed N, but the normal distribution extends over the whole number line. Turns out there’s a simple fix here too: simply let N, the total number of nudges, go to infinity. Take my word for it that as N gets larger and larger, the distribution gets closer and closer to perfectly normal. (This can be demonstrated by a mathematical derivation.)

More important than the mathematical result is the intuitive idea that random errors can be effectively modeled by a random walk. The notion seems perfectly obvious in retrospect (well of course random errors are random!). Still, if you’re like me and you’ve never really thought about where the normal distribution comes from, this realization can be a eureka moment!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s