One of my favorite books is Nate Silver’s *The Signal and the Noise*. Silver is well known as the founder of fivethirtyeight.com, a data-driven news site covering everything from the American economic outlook to the ethnic distribution of NBA fans. In his book, Silver describes his philosophy of prediction and champions Bayesian reasoning. He sensibly asserts that a firm understanding of statistics and probability is essential for making good predictions. Reading the book leaves me wondering about the intersection of statistics, probability, and chemistry.

Of course, chemical theory owes a great deal to statistics and probability. Quantum mechanics is an entirely probabilistic theory, although the concrete orbital shapes organic chemists tend to draw tempt us to think otherwise. Statistical mechanics is built on the idea that a collection of trillions upon trillions of molecules behaves like a massive sample. No social science experiment could ever hope to approach our sample size! In this context, the challenge is developing a theory that fits our clearly high-quality samples (and the theory of statistical mechanics is notoriously complex).

In other areas of chemistry, however, probability and statistics are unfortunately absent. Chemical reactions and synthesis come to mind: one can imagine a reaction system as governed by a set of probabilities—one for each reaction that might occur. The distribution of products formed before isolation depends on these probabilities. When it comes to organic chemistry, most compounds of appreciable size contain multiple functional groups, each of which is susceptible to reactions with sufficiently harsh reagents. Methods development and synthetic planning both involve minimizing the probability of undesirable processes—even if their likelihoods cannot be reduced to exactly zero. Using computer programs to aid synthesis has fallen out of fashion (unless SciFinder counts), but I can imagine a next-generation synthesis program as a Watson-esque guide that lays out several different routes with probabilities of success or “optimal-ness,” based on data from the literature.* Continue reading →