Uncertainty for a Recovering Bayesian Fortune Teller
Attention conservation notice. This is a comment on a Five year old Cosma Shalizi post which begins “Attention conservation notice: 2300 words of technical”
Shalizi argues that a Bayesian must have absolute confidence that he knows the probability of any event — a diffuse prior just means that the probability that variables will be within a given range is low.
I quote a passage which might be too brief to communicate his point
He introduces a random series Xi
I have a standard-issue Bayesian agent. The agent has a hypothesis space, each point m of which is a probability distribution for the random sequence. This hypothesis space is measurable, and the agent also has a probability measure, a.k.a. prior distribution, on this space. The agent uses Bayes’s rule to update the distribution by conditioning, so it has a sequence of measures D0, D1, etc.
[skip] Now I pick my favorite observable event f, a set in the joint sigma-field of the Xi. For each hypothesis m, the probability m(f) is well-defined.
The Bayesian, by definition, believes in a joint distribution of the random sequence X and of the hypothesis M. (Otherwise, Bayes’s rule makes no sense.) This means that by integrating over M, we get an unconditional, marginal probability for f:
Pn(f) = EDn[M(f|X1=x1, X2=x2, … , Xn=xn)]
This is a problem at least to the hypothetical person who thinks we are Bayesian, since we don’t have complete confidence in our subjective probabilities (in fact we usually don’t have subective probabilities at all just the feeling that something is likely, unlikely, very unlikely and things like that). There is also a problem for those who say we should be Bayesian as great confidence in calculated probabilities caused dome difficuilty back in 2008 (Shalizi is commenting on a 2006 book which denounced foolish financiers who had too much confidence in their models).
There is a simple solution which vaguely reminds me of high breakdown point estimators — that is Huber’s approach to robust estimation. His idea was not to assume that all of the data were generated by a process with a finite number of estimated parameters, but rather that, say 90% was and the rest could be generated by any process. This means that the assumptions made by the statistician is somethings which might actually be true.
OK so reverend Bayes let me introduce you to professor Huber. I consider a recovering Bayesian agent who realizes that the most diffuse prior is too strong a belief to be possible. The recovering Bayesian introspects and realizes that a priori she has a hypothesis space each point m of which is each point m of which is a probability distribution for the random sequence and a subjective probability density for each m, but notes that integrating over all m the probabilities integrate up to say 90%. The agent has a 10% subjective probability that the actual distribution of the random sequence is something she hasn’t dreamt of, pershaps something which she can’t conceive.
I don’t see any way to improve on this original hunch of a 10% chance of something strange — certainly Bayes’s formula is no help. But the agent can update the probabilities within the 90%.
This gives posterior beliefs about probabilities of events f of the form: the probability is between p and p+ 10%. In the always subjective 10% possibility that the world is undreamed of, all she can conclude is that the probability can’t be less than zero or greater than one.
This means that a 95% interval of the Bayesian posterior of say X_1000 is post data Bayes Huber subjective 85.5% interval. There is no 95% interval, and the 90% interval will be all conceivable values of X_1000.
This seems to me to be at least suitably cautious. It also reminds me that Brad got back to Shalizi 2009 while defending Nate Silver who did something roughly like this.
I have some problems with this idea. First I don’t have a set of hypotheses which I am 90% sure contains the truth. Rather more along the line of 90% sure the world is different from any possible world I have imagined. So I am stuck with huge 9% intervals. I can’t live with that. Second there has to be some way to improve on the 10%. Third, I have no idea from where the prior probabilities, including the 10% might come. Finally, of course, I am sure the idea is not original and probably decades old.
Oh the 10% extra can also reflect doubts about the prior over hypothesis in the set M.
I certainly don’t know what I am talking about, but I did pick up a book about the Baysean “idea that wouldn’t die.” It turned out to be a People’s Magazine history of Baysean influence on some world affairs but couldn’t bring itself to discuss the actual math.
I got the idea that the Baysean approach had proved valuable in, say, decoding the German “Enigma” and looking for lost H bombs.
So I am guessing wildly that there is something valuable in the approach of “refining your guesses” but from your essay here and my own distaste for certain assumptions that Financial managers use all the time (I am told), that something like “fuzzy math” turns out to work better, or at least be easier to work with, than rigorous but largely inappropriate more old fashioned statistics… which is good for guessing how many light bulbs might be defective based on a sample.
Not sure this helps, and I am sure that people who like mathematical games will not give up trying to make it all perfectly rigorous, but I suspect it might be a tool to be used in some situations and not others.
And of course not to take the “probabilities” of economists seriously.
When researchers first started figuring out how to make a computer think like a human, they assumed that people reasoned logically with ideas flowing from premise to conclusion in a series of syllogisms, but that humans did this imperfectly, which meant that computers would be able to do anything that humans could, but perfectly. This is still a common idea.
In practice, when researchers actually managed to program computers to think like humans, they found out that to do so, computers had to reason inductively using Bayesian statistics. I recently helped my niece get through her Stanford CS courses, so I was rather impressed with the sheer dominance of the Bayesian approach in practical matters. Bayesian reasoning is at the heart of how spell checkers, error correctors, speech recognizers, text categorizers, visual recognizers, conversation systems, image processors, navigators, and even social network and secure system analyzers work.
The basic undergraduate approach would start with some corpus to estimate a set of priors. (For example, a dictionary for a textual system.) Then there would be the reasoning which might involve mathematics (e.g. estimating probabilities based on known error modes, such as leaving out, adding or changing a character). Then, they’d build the actual application. Unspoken was the need to update one’s probabilities and priors – that’s how they talk – and the need to explore mechanisms to apply more conventional probabilistic analysis. I’m sure it’s more sophisticated at the graduate level, so I’m thankful she’s planning on B school, not a CS PhD.
The experience made me appreciate logic even more. Imagine the process by which first order logic emerges from a Bayesian haze of probability. After aeons, the priors are finally adjusted and TRUE AND TRUE = TRUE, but TRUE AND FALSE = FALSE. What a long strange trip it has been.
It seems to me that Bayesian reasoning is at the heart of logical induction. One cannot really know the world, one can only experience to update one’s probabilities. As for priors, they are what we have evolved to know. There are location and time neurons for remembering and reasoning about space and time. There are sequence neurons for planning and pattern recognition. There are change neurons for binocular, motion and color vision. There are sensory neurons that tie directly to the emotions. There are mirror neurons for analyzing one’s own body and the bodies of others. Lately Kant’s Critique of Pure Reason has been showing up more and more often the scientific literature. It seems Kant and Bayes may have been on to something.
thanks for this. i am too old for the math anymore, but i wish i wasn’t.
the study of brains may have made much progress since the last time i looked at the research, but my guess is still that the brain doesn’t have much to do with logic. it’s a free association machine, and fortunately for it the world is mostly “logical” and a disciplined person can sometimes achieve something like logic by forcing his free associations to behave themselves.
i have no idea how a “person” does this, much less a brain, but I have observed that it is relatively rare. even great scientists don’t bother with it most of the time.
As I recall, Bayes Theorem/Formula tells how to adjust general probabilities for additional constraints or conditions (known as “conditional probabilities”), such as when new information is obtained. If I am playing Russian Roulette with a six-shooter and it is my turn after two other players, my chances of losing have increased from 1 in 6 to 1 in 4. It works great when you know the underlying probability distribution and only need to adjust that distribution’s parameters. When you are guessing at the underlying probability distribution, as Drs. Shalizi and Waldmann say, the result of Bayes Formula is a guess.
But in life, faced with a decision, you make your best guess. If it turns out wrong but you survive anyway, next time you try a different guess. This is how natural evolution works, and given enough time it can produce amazing results.
(I don’t think I am saying anything different than the above post or comments but maybe different words might help a few people who wonder what we are talking about.)
When you say So I am stuck with huge 9% intervals. I can’t live with that, what do you mean? With a few notable exceptions, I teach the straight Frequentist approach; accordingly, I tell my students that sometimes you’ve just got to live with wide confidence intervals. Have I been wrong all these years?
it’s entirely possible. I don’t know what you’ve been teaching, but it sounds like you may have missed out on the Bayesian revolution. If you have the math, you might want to look at it and see what it does and does not do. I am far, far from being any kind of expert… though I did teach statistics once… but the little bit I have read makes think there is “something” there. So, not “wrong” perhaps, but missing out on a new tool that does new things.
You may take it as a given, Coberly, that I do indeed know a bit about Bayesian statistics (conditional probabilities and Bayes’ formula have been around for a long time.) I don’t teach Bayesian statistics because a) my students have a hard enough time wrapping their heads around conditional probabilities and b) teaching it is dangerous to people who know too little. I don’t discuss regression much with my first-years for just this reason — regression is a dangerous tool in the hands of people who don’t know how to use it.
fair enough. i did have some exposure to statistics and statisticians and i have some reservations about the way they use statistics. i am not sure what Robert was getting at, and I wondered if the “9%” was a typo for 90%.
I’ll avoid further comment because I really don’t know anything… though in the past I have found that sometimes not knowing anything is an advantage over the people who know something and are so smart they don’t have to even think about it.
@ScentOfViolets when I wrote “I can’t live with that” I meant it literally. In our professional peer reviewed journal lives, we can live with huge confidence intervals (so long as we don’t care about actually being published). But in every day life we have to be confident without proof.
So say I am in a room in what I estimate to the the 50th floor of a building and I think it would be better to leave through the door than the window. Enough skepticism about the unsolved problem of induction could lead me to decide its oh about equally risky to try to exit through the door or through a window.
Acting as an ex Bayesian or even a Bayesian with a diffuse enough prior over things like that leads to a Robert Waldmann pancake on the side walk.
Something which would be seen as irrational over confidence from a very defensible point of view is strictly necessary to get through the day. I think. Interstingly, I arrived at this conclusion talking with Dan Smith about counterfactual conditionals and now I am sitting 3 +/- 10,000 feet from him.
Kaleberg helped me a little. But I keep getting the idea that professional statisticians are trying to do something with Bayes that it won’t do: give you a meaningful probability about one time events.
On the other hand “guessed” probabilities, refined by experience, seem to be useful in solving problems where, unlike Zeno’s Achilles, close enough for practical purposes is close enough.
Now, once again, I don’t know anything. But I bet there are others out here would would profit from a brief tutorial.
Now I see what you mean, Robert. But it turns out that in real life these situations occur all the time. That’s where adaptive heuristics come in. Or don’t, as the case may be; it seems that there are at least a couple of reported deaths every year from people who don’t know better trying to walk across a filled grain silo, for example. Perturbatively non-renormalizable theories of gravity require an infinite number of measurements before even one prediction can be made, if you want a more abstract real-world example. Sometimes the limits of knowledge are very close indeed.
I think the point of confusion is when to apply Bayesian methods. The key criteria are that a probability distribution exists that underlies the value being measured and that this probability distribution is static. That is, it can’t change over time. This last point is one that is frequently ignored, and the one that make use of Bayesian methods inappropriate for many economic models as that guarantee of unchanging probability distribution can’t be made.