Bayesian Chemistry: How Chemical Experts Improve

One of my favorite books is Nate Silver’s The Signal and the Noise. Silver is well known as the founder of, a data-driven news site covering everything from the American economic outlook to the ethnic distribution of NBA fans. In his book, Silver describes his philosophy of prediction and champions Bayesian reasoning. He sensibly asserts that a firm understanding of statistics and probability is essential for making good predictions. Reading the book leaves me wondering about the intersection of statistics, probability, and chemistry.

Even today, I would argue that statistics and probability are underappreciated in organic chemistry.

Of course, chemical theory owes a great deal to statistics and probability. Quantum mechanics is an entirely probabilistic theory, although the concrete orbital shapes organic chemists tend to draw tempt us to think otherwise. Statistical mechanics is built on the idea that a collection of trillions upon trillions of molecules behaves like a massive sample. No social science experiment could ever hope to approach our sample size! In this context, the challenge is developing a theory that fits our clearly high-quality samples (and the theory of statistical mechanics is notoriously complex).

In other areas of chemistry, however, probability and statistics are unfortunately absent. Chemical reactions and synthesis come to mind: one can imagine a reaction system as governed by a set of probabilities—one for each reaction that might occur. The distribution of products formed before isolation depends on these probabilities. When it comes to organic chemistry, most compounds of appreciable size contain multiple functional groups, each of which is susceptible to reactions with sufficiently harsh reagents. Methods development and synthetic planning both involve minimizing the probability of undesirable processes—even if their likelihoods cannot be reduced to exactly zero. Using computer programs to aid synthesis has fallen out of fashion (unless SciFinder counts), but I can imagine a next-generation synthesis program as a Watson-esque guide that lays out several different routes with probabilities of success or “optimal-ness,” based on data from the literature.*

Perhaps the next generation of synthesis programs will use probabilistic predictions?

Perhaps the next generation of synthesis programs will use probabilistic predictions?

All students of chemistry probably develop visual triggers that tip them off to the dominance of a particular reaction or phenomenon. Where experts shine is in the recognition that a dominant process is not necessarily the only game in town: tweaks here or there may cause the likelihood of a different process to skyrocket. Opening up one’s mind to the possibility of alternative reaction pathways or phenomena is part of becoming an expert: “expect the unexpected!” Spend enough time around an expert chemist, and she’ll eventually blow you away with a penetrating insight that you never saw coming, but that makes perfect sense in retrospect. How does she do it? Part of that expertise is maintaining an awareness of the probabilities associated with different outcomes, and recognizing the signs that increase small probabilities.

My labmate during my first year in grad school had one such experience that I still remember. He was making an allylsilane for studies of silicon-based cross-coupling, and despite multiple columns, the NMR spectrum of the compound looked strange. When he was nearly at his wits’ end, the boss surprised him with a curveball: the allylsilane had multiple diastereomeric conformations. A couple of VT NMR experiments later, the boss’s hypothesis was confirmed. If I’m remembering correctly, the two confirmations were interacting with one another, complicating the analysis even further!

Persistent diastereomeric conformations might be relevant to something like 1% of all organic reactions. Still, the boss recognized the signals that increased the likelihood of this phenomenon, refining his estimate of the probability of its relevance upward as new information became available. Ultimately he made a prediction that very well could have been wrong, but had a reasonably high probability of being right. This is Bayesian reasoning in action!

Part of the draw of returning over and over again to the chemical literature, I think, is the thirst of experts for a finer appreciation of the probabilities of chemical phenomena. Few papers break new conceptual ground or cause us to think about a set of experiments in a completely new way. On the contrary, most papers add to growing paradigms and help us understand how small tweaks affect experimental results. This idea of establishing a mental paradigm and tweaking it as new information comes to light is exactly the process of Bayesian reasoning.

In the classroom, I don’t think organic chemists do a good job of teaching the paradigm in many cases. Resonance, for example, is a fairly systematic and orderly concept. Lone pairs, π bonds, and electron sinks (carbocations, empty orbitals) linked together imply electronic delocalization, which resonance structures are designed to show. Instead of laying out the general paradigm, we often inundate the student with examples and then deal with a flurry of questions in the aftermath, dancing around general principles but never drawing general figures. Learning by induction isn’t the right way to learn resonance, in my opinion. Make sure everyone is aware of the basic scheme in class, I say, then let students apply the general paradigm to examples outside of the classroom.

* The unreliable nature of chemical literature is, of course, an issue…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s