# Why be normal

Robert Waldmann

is commenting on a comment at Mark Thoma’s blog. The comment

NotMarkT ha detto…

One of the difficulties with model selection for assessing tail probabilities is that the empirical data often can’t distinguish among light (e.g., normal) or heavy tailed (e.g., pareto) distributions. The model selection problem then becomes a question of optimism or pessimism in the face of uncertainty. By convention, optimism has won.

(note that firefox speaks to me in Italian.)

My reply after the jump.

I think you are absolutely right. However, the victory of optimism is odd, and, I think, impossible to reconcile with Bayesian rationality.

First. If we don’t know if distributions have thin or thick tails and the undeniably relevant data aren’t very informative (because we are talking about tails of distributions) then we should have a diffuse posterior. Yet people repeatedly assume that they know for sure that a distribution is normal and the only uncertainty concerns means variances and covariances. Acting as if you know something for sure, just because the data can’t prove you are wrong is not rational at all.

Note the key word “undeniably” in the paragraph above. Since well before Gauss, firms and people have suffered because they assumed thin tails. Financiers know this. They all convince themselves that they stress test. Then they all convince themselves that this time its different. The lack of sufficient relevant data is based on the assumption that a whole lot of data is irrelevant.

I think that lack of sufficient undeniably relevant data is important, but it can’t be the whole explanation. I think there are three other factors at work.

First there are agency problems. Traders who are allowed to cash out short horizon mark to market returns are rationally risk loving and can snow the top managers who are supposed to keep them from running too much risk.

Second there is self selection of the subjectively over confident. If rational people know that they don’t know much of anything, then only irrational people will take massive positions and trade actively. This means that the people who most affect asset prices underestimate variances of their forecast errors, but it also especially means that the people who matter over-estimate the probability that their preferred parametric specification is very close to reality.

Third there is selection of the people who take on lots of tail risk. Some will get burned by tail risk and be fired. The others will have high sample average returns with low sample variance and get promoted. Promotion can mean a tenfold increase in their allowed gross long position. This is irrational. The top managers mistake luck for skill. Obviously, their self esteem is based on assuming that what matters is skill not luck.

Finally, last but not least, as you note, optimism is more fun and people don’t like to listen to Cassandras.

“Acting as if you know something for sure, just because the data can’t prove you are wrong is not rational at all.”

Hear, hear! 🙂

“Yet people repeatedly assume that they know for sure that a distribution is normal and the only uncertainty concerns means variances and covariances. Acting as if you know something for sure, just because the data can’t prove you are wrong is not rational at all.”

If I had nothing better to do I’d go through the posts from the past year and post this comment at least 365 times. Instead I will simply wait 24 hours or so until I can slap it back in a Bear’s face.

Robert:

Are there no P Values, work houses, or debtor prisons?

P> .05% assumes the data distribution is normal, P< .05% assumes the data is non normal when the data distribution is tested. There are whole arrays of statistical techniques on minitab to test data distribution after determining nornal or non normal status. In either case, you are correct. Typically, the data is assumed to be normal and normal techniques are used. Graphical Summary in minitab does distinquish normal or non normal, mean mediian, std dev.. kurtosis, and skewness. http://www.scientificcomputing.com/images/0801/sc81602b_lrg.gif To determine what distribution non normal data may be, you use Individual Data Distribution in Minitab. 15 or so(? – have to renew my lease) distribution models are used against the data to determine best fit. in which cash we are back to P-Values and determinining best fit.

Robert I will add two other criteria for poor analysis. Typically not enough data samples are taken and it is assumed the data correlates to something when at best it points a direction whether normal or non-normal. While I agree with your message, I disagree that one can not tell if there is skewness or kurtosis or whether the data is normal or non normal. The means exists and people are either ignorant of the techniques to determine such or do not care.

So where does this place financiers . . . snake oil salesmen? If they can not determine risk, then we are all sailing on a hope and a prayer.

Jay:

Be my guest in posting such and your assumption would be no better than their assumption. In most cases, the people supplied the data from which they made the assumptions. I tested cactus’s data and found it to be normal. In any case, the determination of normalcy only points a direction which you can accept or not accept. One can always piss in the wind in disagreeing with the direction pointed by the data also.

Just don’t go calculating the power of a sample not much more than n = 20 (the number of presidential terms since 1930).

And Minitab? I thought people moved on to SAS or something more powerful after undergrad.

Was this supposed to be some sort of arch remark about 0.05 value? Jay, SAS is the package of choice, where possible, but it’s

expensive; our license fee runs up to ~$100 K/year. I’m not even sure it’s possible to buy a copy of it outright. I’d go with SPSS next and minitab is, as you say, something that undergraduate students are very comfortable with.The issue is that events outside the range measured can clearly affect the curve. For example in housing not including the price declines of the great depression changed the shape of the curve. In a similar manner the survey the USGS did of the Colorado river canyons showed that major changes happened in episodic floods every few centuries. This is an area where trying to go completly by the numbers leads us astray. Using historical records that may not be of the quality to go into the numerical calculation can at least provide a clue as to how fat the tail is. Since given the data sets input either choice of a tail matches the data by being qualitative one could add information. Or alternativly always assume a fat tail (i.e. Mr Murphy and his law will come along when least expected, what can go wrong will go wrong, and perhaps even the corrollary Murphy was an optimist). Assume the worst and you wont go bust but may not make as much money.

Just to be clear here…are we being asked to doubt results from minitab? Minitab finds (or does not) kurtosis where SAS would not (would)? Maybe just a case of “my daddy can whip your daddy”?

This is slightly off topic, but do people ever use R? It’s open source.

I’ve never used SAS, but my statistical work is limited to very

simple problems. Maybe R is not complete enough to do

real statistics.

On item that emerges is hiding behind quanitative numbers as an alternative to thinking. This is sort of a version of the old figures don’t lie but liars figure, which of course is encouraged by computers, that make figuring easier. On should always look at a calcuated result and ask does this make sense to me based upon ones own experience and knowledge. In the old days with slide rules one tended to do this more, and of course one had maybed 2.5 digits of information plus having to keep the exponet in the head.

One concern not addressed here is that the history contains a single, stable

levelof volatility.For equities, the VIX indicates traders’ perception of future volatility; since the VIX is

per setradable, a better forecast than the consensus is directly profitable. Without claiming HOW forecastable it is, I’ll note that it has experienced a 5:1 range over the last couple of years.Suppose that the VIX forecast one-day-ahead volatility exactly perfectly, and that the outcome return was drawn from a perfect normal distribution, conditioned only on the volatility. Then the distribution would be extremely fat-tailed, failing all the tests of normality.

But note: this hypothesized outcome, which I’m a bit lazy in substituting for the actual data that I claim it resembles, fails normality tests not because of a failure of normality, but from the failed assumption of homoskedacity.

Of course Minitab will calculate the same kurtosis as SAS. I was just surprised that people around here use Minitab. Everyone I know that got to graduate level and above moved on to more powerful tools.

If you have ever done power calculations on samples of size n = 20 you would understand that miscalculation is not necessary to get results that should make us cautious before claiming we have the absolute truth.

Hate to say it but this sounds a lot like what Nassim Taleb has been espousing…

SOV:

No arch remark. Minitab is what I use when I walk into companies and they ask me to point the way or direction of the issues at ahand. I can lease Minitab for $400/year or buy it ouright for far less than $100,000 you quoted for SAS or < then $1500. What company would expend that much? Minitab provides most of what companies need and will test for normality, the topic Robert has chosen to discuss. If you check with what companies are using for LSS, its Minitab. Demean it as you will, it works SOV and I have used it numerous times as well as many engineers.

Jay:

I do not believe anyone claimed there is absolute truth in the statistical analysis. All the analysis will do is determine whether the distribution is normal or not. Nestle, Caterpillar, Kraft, Owen-Illinois, etc. are using Minitab in LSS to do their statistical analysis and drawing batches of 20-30 on various components and collecting 300 samples (10 batches) over a period of time. It is tedious work to select the samples from each batch and weigh them or measure if necessary.

Jay:

Minitab is still the weapon of choice for business in doing statistical analysis and LSS unless you choose to use Datafit for multiple linear regression analysis, which is probably the same as SAS. Unfrotunately, I don’t sit on an academia predestal. I am sure SAS is a very good tool.

I think the real agency problem is that most traders, particularly the traders who make big market making trades, have no risk. Their salaries are high enough so that surviving perhaps a year on the job is enough to set them up for life. Sometimes all they need to do is cash their signing bonus check. That means that all other “risks” are not risks to them at all. They have no downside, save not getting enough money for a seriously enhanced lifestyle or, perhaps if medical developments allow, enough money for a second or third life.