What was your least favorite part of your science education? Many people who only took a few science courses in high school or occasionally in college might drop entire fields (like the one I hear the most, “chemistry”). My least favorite part of science is more specific than that, and it’s not even close: *error* blew all competition out of the water. I hated all things error (including the king of terrible error-related things, *error propagation*) with a burning, blinding passion. Reflecting on those feelings now makes me realize how far I’ve come intellectually since then, so it felt like a natural topic to write about. One hears a lot about intellectual development in books on teaching and student learning these days, and I’ve discovered that my experience with error mirrors some of the theories out there. Maybe they’re on to something…!

I was an idealist in college (and still am today, to a large degree). Really, I started out as what one might call a “deterministic idealist.” Science described the world in deterministic terms in the form of equations, and if I learned these equations I would become privy to all the secrets of the universe. A particular system (experiment, say) either conformed to one of these immutable equations or didn’t, and if it didn’t, there was another, deeper equation that just hadn’t been written down yet that could describe the system. Hence, my hatred of the idea of error: *“I get it. Experiments are subject to error. That’s not the important part of this experiment; why the hell are we worrying about it?!”* Take the compressibility factor of a gas: why do we care about error when what matters is is measuring

*P*,

*V*,

*n*and

*T*and calculating

*Z*?

That haughty attitude persisted well into my early years in graduate school. It corresponds roughly to the first stage of Perry’s process of intellectual development, dualism. “To solve a problem in the lab, all I really need to do is measure the relevant stuff and apply the Right Equation(s) to The Measurement to obtain The Answer.” The deferential capitalization is intentional! I would get so angry propagating error because it felt utterly irrelevant to The Answer. It was nothing but busy work!

That perspective seems so immature and proud in retrospect…what about uncertainty? Why trust your instruments? Why trust your own shaky hands, or your blurry eyes? Somewhere during my education—well after undergrad, mind you—I figured out that the world doesn’t operate in black and white. Uncertainty is a fact of life, and it needs to be accounted for. This viewpoint is more like Perry’s third level, relativism. “Answers need to be backed up by good reasoning (and good scientific reasoning must take error into account).” Better, I thought. I had learned how to do error propagation in college, but I never really understood *why it worked*. Honestly, because of my intellectual level back then, I doubt I was even *capable* of learning how it worked. A frightening thought! I remember blindly applying the formula, but never fully wrapped my mind around it.

A few words about the formula. *y* is the final value of interest (say the compressibility factor *Z*) and is a function of three variables, *x*_{1}, *x*_{2}, and *x*_{3}. Each ε is the uncertainty associated with a variable; we might pull the uncertainties of the independent variables from instrument manuals, the 95% confidence intervals of several measurements, or the markings on our measuring devices. ε_{y} is what we really want: the uncertainty in the dependent variable. How might we go about finding it? Well, we can start by asking what happens when *y* changes a teeny tiny bit due to tiny changes in the *x*‘s. Assuming that only the *x*‘s are changing and that the *x*‘s are independent, the total differential captures all the changes in *y* due to changes in the *x*‘s. If we think about the *dx*‘s as the uncertainties in each independent variable, isn’t the uncertainty in *y* just *dy* in this formula? In a worst-case scenario in which all the deviations of the *x*‘s are maximal and all positive, the deviation in *y* would be *dy*.

This is a decent estimate for the uncertainty in *y*, actually, but it isn’t *best*. It tends to overestimate the uncertainty, which is easy to verify if we send fifty undergrads into a lab, one at a time, to measure the compressibility factor of a single sample of gas. The 95% confidence interval in *Z* that will come from the undergrads’ data (just using *Z* and ignoring how it’s calculated) will almost always be smaller than the interval predicted by the total differential. Why?! This is worth thinking about; you really understand uncertainty if you can explain this tendency of *dy* to be too harsh an estimate.

The issue is that *dy*‘s due to different variables may cancel, and in fact will have a tendency to cancel or add to relatively small sums if the *x*‘s are independent. Another way of saying “*x*_{1} and *x*_{2} are independent” is “*x*_{1} and *x*_{2} don’t vary together”—when *dx*_{1} is big, *dx*_{2} is not likely to also be big! Feeling the full effects of the total differential is thus fairly unlikely. More moderate *dy*‘s are to be expected.

Instead of using absolute value, let’s look at the total differential from another, equivalent perspective. Conveniently, |*x*| = (*x*^{2})^{1/2}. Let’s apply this formula to the total differential…

I don’t know about you, but seeing that bottom line makes me want to open wide and shove my face into a banana cream pie. It looks nasty, but bear with me…the first three terms are just the first three terms under the square root in The Formula above! So now we see mathematically where the total differential’s overestimation comes from: all those “cross terms” in the expanded square contribute. Evidently, all of these terms are small enough to ignore when calculating the uncertainty in *y*. Again I ask, why is this the case? My dualistic brain would have been at a loss here, but I can’t resist digging into this just a little bit more. If you’ve made it this far, props!

Let’s re-ask at this point a question we’ve already asked: how do the ε’s vary with one another? The correlation of any ε with itself is obviously perfect, so the ε^{2} factors in the first three terms will have some meat to their bones (to prove it to yourself, finish this sentence: “when ε_{1} is large, ε_{1} is…”). What about the cross terms? Well, since the independent variables are, erm, independent, their ε’s must be independent too. So, when one ε is extremely large (say, 2% likely), the other ε’s are probably much smaller (somewhere within their 95% confidence intervals, closer to zero). The *product* of a large value with a small value shakes out to a small value—almost certainly smaller than the square of any one of the ε’s. Hence, all the terms containing one ε multiplied by a different ε are likely to be small. The last three terms under the square root drop out and we’re left with The Formula! The assumption that cross-products of the ε’s with one another will be small isn’t perfect; that’s why the squiggly equals sign is used.

This post has rambled on long enough, but I’d be lying if I said I didn’t find error propagation fascinating these days. My former self would be appalled, so I suppose I’m an “error convert”! It’s a sobering reminder that intellectual development is a real process that teachers need to think about. Perry’s theory has been superseded by a number of more nuanced models, but it’s worth a look if you’re interested (see here for more).

Dude, you’re a lot smarter than me.

“I must warn you, I am susceptible to flattery…” :-)