by Mike Kimel

Some Thoughts on Statistical Analysis (A bit wonky)

Via Tyler Cowen, a pretty good post on statistical analysis. While I suggest reading the post, I’ve reproduced the first sentence of each point the author makes below (though have cut out the pieces in betweeen):

1. When you don’t have to code your own estimators, you
probably won’t understand what you’re doing…
2. When it’s extremely low cost to perform inference, you are likely
to perform a lot of inferences…
3. When operating software doesn’t require a lot of training, users of
that software are likely to be poorly trained…
4. When you use proprietary software, you are sending the message that
you don’t care about whether people can replicate your analyses or
verify that the code was correct.

I think for the most part, this is a brilliant post, and it is worth keeping the main points in mind as they are very, very good. That said, I tend to disagree slightly with a few points. One is the author’s love of R, stated repeatedly in the pieces of the post I didn’t quote. I find R to be a pain in the #$$. So does the author, but he feels that is a virtue as per item 2 above. Frankly, the older I get, the more I use (wait for it!) Excel to do statistical analysis. Excel doesn’t do much in the way of sophisticated work, and what it does is clunky. I’m guessing it would violate the author’s point 4, being proprietary, but I’ve checked enough results on Excel
using matrix algebra that I’m satisfied its basic regression functionality works. If you want to do anything more sophisticated than a plain old OLS regression, you need to code it yourself in VBA.

But here’s what I love about using Excel… Odds are, whatever data you’re using to do your analysis, you started off by sorting it in Excel in the first place. Its easy to sort and graph the data, and the eyeball and the nose are always the most important tools in statistics. Another cool thing about Excel – it will spit out the
residuals. And you can graph and organize those residuals twenty three ways from Sunday, which means you can build your own diagnostics appropriate for whatever task you’re doing. Unless you absolutely positively have to run a system of equations fast, or you have a client who has glommed onto some absolutely useless statistical tool with a cool name (dig through an advanced econometrics book and you’ll spot ’em aplenty) and you can’t shake them from their ignorance, there isn’t much point to using R or Stata or you name the program. If your problem is not lack of time, but rather data sets that are too big for Excel to handle, find someone who is good in C++.

Another thing… I suspect the author of that post and I have similar views on SAS. My own experience – I have seen or had associations with a number of organizations that consider themselves data savvy, and in
my opinion, from what I can tell, if a company has X SAS licenses, it almost invariably means the organization has X junior analysts whose job it is to reach actionable conclusions using SAS who haven’t got
the vaguest clue how to interpret the results of the simplest statistical analysis. This isn’t to say SAS is a bad tool, but merely that many organizations consider having a SAS license to be, in an of itself, a holy grail that allows them to dispense with any real expertise.

One final disagreement with the author – he seems to be falling into fallacy many economists do, which is assuming that what economists do is a science. This to me indicates a failure to understand that a science is something scientists do, and what economists do is very different from what scientists do. In biology, you won’t get very far if you try to peddle Lamarckian evolution, Lysenkoism or ID. Try being a proponent of the theory of phlogiston or the aether wind and see what that does for your career in physics. But the equivalent
behavior in Economics won’t stop you, and may even prove beneficial to those wishing to get tenure at Harvard or Chicago, or be Dean of an Ivy League business school. And then there are the think tanks…