Attention conservation notice. This is a comment on a Five year old Cosma Shalizi post which begins “Attention conservation notice: 2300 words of technical”
Shalizi argues that a Bayesian must have absolute confidence that he knows the probability of any event — a diffuse prior just means that the probability that variables will be within a given range is low.
I quote a passage which might be too brief to communicate his point
He introduces a random series Xi
I have a standard-issue Bayesian agent. The agent has a hypothesis space, each point m of which is a probability distribution for the random sequence. This hypothesis space is measurable, and the agent also has a probability measure, a.k.a. prior distribution, on this space. The agent uses Bayes’s rule to update the distribution by conditioning, so it has a sequence of measures D0, D1, etc.
[skip] Now I pick my favorite observable event f, a set in the joint sigma-field of the Xi. For each hypothesis m, the probability m(f) is well-defined.
The Bayesian, by definition, believes in a joint distribution of the random sequence X and of the hypothesis M. (Otherwise, Bayes’s rule makes no sense.) This means that by integrating over M, we get an unconditional, marginal probability for f:
Pn(f) = EDn[M(f|X1=x1, X2=x2, … , Xn=xn)]
This is a problem at least to the hypothetical person who thinks we are Bayesian, since we don’t have complete confidence in our subjective probabilities (in fact we usually don’t have subective probabilities at all just the feeling that something is likely, unlikely, very unlikely and things like that). There is also a problem for those who say we should be Bayesian as great confidence in calculated probabilities caused dome difficuilty back in 2008 (Shalizi is commenting on a 2006 book which denounced foolish financiers who had too much confidence in their models).
There is a simple solution which vaguely reminds me of high breakdown point estimators — that is Huber’s approach to robust estimation. His idea was not to assume that all of the data were generated by a process with a finite number of estimated parameters, but rather that, say 90% was and the rest could be generated by any process. This means that the assumptions made by the statistician is somethings which might actually be true.
OK so reverend Bayes let me introduce you to professor Huber. I consider a recovering Bayesian agent who realizes that the most diffuse prior is too strong a belief to be possible. The recovering Bayesian introspects and realizes that a priori she has a hypothesis space each point m of which is each point m of which is a probability distribution for the random sequence and a subjective probability density for each m, but notes that integrating over all m the probabilities integrate up to say 90%. The agent has a 10% subjective probability that the actual distribution of the random sequence is something she hasn’t dreamt of, pershaps something which she can’t conceive.
I don’t see any way to improve on this original hunch of a 10% chance of something strange — certainly Bayes’s formula is no help. But the agent can update the probabilities within the 90%.
This gives posterior beliefs about probabilities of events f of the form: the probability is between p and p+ 10%. In the always subjective 10% possibility that the world is undreamed of, all she can conclude is that the probability can’t be less than zero or greater than one.
This means that a 95% interval of the Bayesian posterior of say X_1000 is post data Bayes Huber subjective 85.5% interval. There is no 95% interval, and the 90% interval will be all conceivable values of X_1000.
This seems to me to be at least suitably cautious. It also reminds me that Brad got back to Shalizi 2009 while defending Nate Silver who did something roughly like this.
I have some problems with this idea. First I don’t have a set of hypotheses which I am 90% sure contains the truth. Rather more along the line of 90% sure the world is different from any possible world I have imagined. So I am stuck with huge 9% intervals. I can’t live with that. Second there has to be some way to improve on the 10%. Third, I have no idea from where the prior probabilities, including the 10% might come. Finally, of course, I am sure the idea is not original and probably decades old.
Oh the 10% extra can also reflect doubts about the prior over hypothesis in the set M.