Teaching Statistics in High School

There is an interesting discussion about a topic where I know especially little: K-12 education. Within it, there is a narrow discussion about whether it makes sense to try to teach statistics to people who don’t know calculus. This is a clear question. However, it seems that the people who discuss it skip a much more basic question which is why mathematical statistics should be taught to high school students, and an even more basic question which is what do statisticians have to teach us.

Links and snippets.

Kareem Carr tweeted

Which causes me to ask how exactly calculus helps us develop valid statistical tools.

When mostly writing about other things, Matthew Yglesias wrote

“But one big problem with this idea, as Kareem Carr noted, is you can‚Äôt really¬†teach¬†statistics properly without calculus, which I think goes to underscore that this is not really a debate about the proper sequencing of math classes. It‚Äôs instead another manifestation of the dysfunctional tendency in some edu-left circles to¬†stigmatize all efforts at measurement. If you sort kids into different math tracks based on their test scores, that might reveal that Black and Latino kids are doing worse than white and Asian ones. If you refuse to sort, you can pretend you‚Äôve achieved equality. But will you provide useful education?”

Will you ? Can you teach statistics properly without calculus ? Is the effort to develop useful mathematical statistics at the point where one can “teach statistics properly” . I stress I think that the accomplishments of mathematical statisticians are immense and immensely important. However, I also think there is a risk of keeping things simple by making strong false assumptions and teaching students things which just aren’t true.

I agree in part (and disagree in part with Brad DeLong who wrote in response to Carr mostly

“Here is the fact: the rules of statistics¬†are¬†really arbitrary. The formulas¬†do¬†come out of nowhere.

The formulas come out of the brain of Carl Friedrich Gauss. The formulas for average (rather than median) and for least-squares linear fit make sense if the idea is to minimize the sum-of-squares of the retrodiction errors, which makes sense if the distribution of disturbances to some underlying linear true relationship is: [Gaussian]”

I suspect that high school (and college and graduate) students are taught formulas which are valid only under very strong assumptions or asymptotically (note the slogan of my defunct personal blog “asymptotically we will all be dead”).

But really the phrase “The formulas” refers to a whole lot of formulas only some of which follow from the assumption that some stochastic variable is normally distributed or the sample size is large enough (with no analysis of how large is large enough or how to tell if we have enough data).

High schools students certainly should not be taught that the right way to estimate the location of a population is to calculate the mean of a sample. As written without qualifications that is simply a false statement. They should also not be taught that the right way to estimate the mean mode and median of a normal distribution is to calculate a sample average — that is a true mathematical statement but too strongly suggests that this example is very often useful without giving any hints about how to find out when it is useful.

I am quite sure that one shouldn’t introduce statistics by teaching about what one would do if one were to know that stochastic variables are normally distributed. That is a worthwhile topic (or at least I hope so as I have spent a fair amount of time considering the distributions of estimators and test statistics under the assumption of normality). But it seems to me to be a very bad thing to present it as a useful first step.

Brad goes on

“We, today, rest [our confidence] on the Central Limit Theorem and the convergence-in-distribution of a sum of independent random variables to a Gaussian distribution if¬†[one of many sufficient conditions is assumed].

In all this, knowing calculus‚ÄĒin the sense of having it in your intellectual panoply‚ÄĒis very useful for getting from [one of many sufficient assumptions] to the Central Limit Theorem. Knowing calculus is very useful in getting from the Gaussian distribution to the optimality of taking-averages and least-squares. [skip]

And here’s the crux: in the real world of using statistics, where your sample is finite, where one or a few of the disturbances are large relative to the total, where your sample is non-random, where your observations are not independent, the Central Limit and Gauss-Markov theorems are of little use. Yes, taking averages and least-squares fits are the first things you should do. But then you should not down tools, because of calculus! Then you should do other things and see if they agree with taking-averages and least-squares. And if they do not you should think hard about the problem.

Statistics is, IMHO, better taught as if you are teaching engineers rather than mathematicians. And I think that California is probably right in wanting to put engineering-focused statistics before calculus in the sequence.”

I absolutely agree with this. I think it is important.

Again and before going on I don’t think it is fair to statisticians to claim they rely on the central limit theorem and the Gauss-Markov theorem. Who is this “we” Brad ? I sure don’t rest my confidence on that. I don’t think there are many mathematical statisticians who rest their confidence (if any) on that.

But, while I think Brad conflates current research statistics and a possible bad introductory statistics course , I think he describes a very important problem. I recall Noah Smith (following others I am sure) denouncing economics 101 ism. This is an ideology based on the assertion that economists consider the very first models we teach to be useful approximtions to reality. It is actually unfair to really existing economics 101 courses which go on. It is more accurately described as “first two months of economics 101 ism”. But it is definitely a problem. In particular, there is a problem with teaching first about perfect competition (and rationality and symmetric information and static models or models with complete markets and … I shouldn’t have gotten myself started on that).

I will now try to get to an actual point. I have some thoughts about some things which everyone should be taught and they do not include calculus (trying to teach everyone calculus would be impractical anyway).

I think it is important to try to teach people about probability. I think it is possible, not easy, and useful to teach people Baye’s formula, present cases in which it is useful, and prove it is useful in those cases. People do not, in fact, think about probabilities and conditional probabilities in any way which could be rational, nor do they make choices under uncertainty which are arguably optimal. I am fairly confident that people can be convinced of this and convinced to guard against misleading heuristics.

I think it is useful to teach people about summary statistics (because they are presented all the time to the public) and the risks of relying on means and variances. The cases in which the median is a better estimate of the location of a random variable are not rare and should not be presented as if they are obscure special cases.

I think it is important to teach about causal inference and valid instruments. Here it seems that people have been taught that correlation is not causation. That is a good thing. Also post hoc is not propter hoc. However, it is also true some data sets described as natural experiments are properly described as natural experiments. I think the (not so few anymore) cases of valid causal inference based on non-experimental data are fairly easy to understand. I think a good rule is;

“If you can’t follow the data analysts argue for why his calculation is worthy of your interest, it probably isn’t.”

The lesson that mathematics can be used to cover up reliance on strong implausible assumptions is, I think, very useful. It is the opposite of the lesson that you can obtain a basic understanding of mathematical statistics by considering normally distributed variables.

I think it is important to teach people that they shouldn’t trust a model without checking whether it forecasts well out of sample. This is easy to teach with simple examples. It is important.

I think it is easy to present monte carlo simulations. Here the teacher says “I will make the computer generate some pseudo random data.” Notice that I have to tell it something very specific. Anything we learn may depend on that specific assumption. Now I tell it to calculate this summary statistic, point estimate, test statistic. Now we draw 10000 pseudo samples. Here’s what comes out.

That would give the students the impression that statisticians rely on a black box. That they are doing numerical experiments which has the fault of mathematics (all conclusions are based on assumptions) without the elegance. I think an advantage of giving students this impression is that I think it is 100% accurate.

I think it is very important to explain what the Neyman Pearson framework isn’t. What it means if a null hypothesis isn’t rejected. Here it is very easy to do numerical experiments. It is very easy to choose a data generating process, not tell the students what it is, and figure out things about it in front of them (with massive use of a pseudo random number generator). I think one should choose 10 data generating processes, have the computer pick one at pseudo random at the beginning of the lecture and figure out which one it picked in front of the students.

Most of all, I think the course should be focused on actual empirical problems — Questions which people in the field agree have been answered by analysing data and then the presentation of convincing data analysis.

Basically I agree with Brad.


OK enough constructive discussion of possible curricula. I will now return to polemic. I claim that asymptotic theory is not useful and is not used. I have seen a lot of it presented and enjoyed the proofs greatly. I have also noticed that *after* the asymptotic analysis, mathematical statisticians consider a special case, generate 10000 (or now I guess many more) pseudo data sets and check the asymptotic analysis using monte carlo simulations. Why not cut out the middle man ? What was the point of the asymptotic analysis ? Is it like the dread DSGE used to go from an IS-LM idea to find some way to get a DSGE model to act that way, to an IS-LM explanation of what happened in the computer during the otherwise incomprehensible simulation ?

More generally, I would suggest abandoning the concept of infinity. I really think everyone should read “Avatars of the Tortoise” or at least the opening;

“There is a concept which corrupts and upsets all others. I refer not to Evil, whose limited realm is that of ethics; I refer to the infinite.” any claim in asymptotic theory is “for any positive number epsilon, there is a N so large such that this calculation is correct within epsilon if your sample size is greater than N”.

No asymptotic theory can be used to calculate that N. In my experience, mathematical statisticians have always (always) used numerical simulations to check if some N is large enough.

Applying the insight that there is some N large enough is like solving the halting problem. It is impossible. It is obviously impossible. The fact that it is so obviously impossible causes people to assume that they must have misunderstood something when they have undestood perfectly.