I dare to disagree with Nate Silver
Robert Waldmann
Nate Silver, like essentially all election handycappers, ignores internal polls — polls financed by one of the candidates or by the party of one of the candidates. Unusally, he explained why noting that even if the polls are unbiased there is extreme publication bias as campaigns release the polls if and only if they like the results. I think that, if this is the explanation of the known bias in published internal polls, then internal polls should be included in averages of polls.
update: This post is basically a critique of a post by Nate Silver to which I didn’t link. The assumptions I use below are based on things he conceded for the sake of argument. He presents a simple formula which I critigue. All quotes from the post are updates.
I’m not sure why people take polls released by campaigns at face value. This does not mean that campaigns don’t have very good pollsters working for them. But the subset of polls which they release to the general public is another matter, and are almost always designed to drive media narrative.
So the polls are OK when conducted, but the choice to release the polls is strategic and therefore the polls should not be taken “at face value.”
What we’ve found is that is that polls commissioned by campaigns and released to the public show, on average, a result that is about 6 points points more favorable to their candidate’s standing than nonpartisan polls released at the same time. (Other analysts have found similar results.) So, just as a first cut, you might take a Democratic internal poll that shows a tied race and “translate” it into nonpartisan terms by adding 6 points to the Republican’s margin.
I note the weasel phrase “as a first cut” and argue below that a better first cut is to just include internal polls unadjusted if, as Silver semi asserts in the first quoted passage, the problem is just publication bias. The rough 6% correction is definitely not the best first cut. I say 0% is best but 6% is definitely too much because of a universally accepted argument discussed below.
end update
All campaigns appear to believe that reporting a poll which would be good for the candidate if accurate is definitely good for the candidate (they believe in a bandwaggon effect. So the publication bias hypothesis makes sense. I am going to make absolutely key unsupported assumptions in order to write down a model. Then I will pretend the model is reality.
There are two candidates A and B.
Assume that there is an unbiased agreed upon average of consensus based on public polls — A ahead by dpublic. Assume each campaign conducts one unbiased poll and that these polls are of equal quality (same sample size etc). Assume A releases A’s poll if it shows A ahead by more than dpublic and B releases B’s poll if it shows A ahead by less than dpublic.
Claim 1: Then the average including internal polls is an unbiased estimate of the expected value of the actual vote.
Proof: everything is symmetric around A ahead by dpublic so the A’s expected lead including the internal polls is dpublic which is assumed to be unbiased.
Claim 2: the variance of the average including the internal polls will be lower.
Non proof: This is obvious it is an average with an equal or larger denominator.
So what are the key assumptions ? First I don’t really need to assume one internal poll each, but I do need to assume equal numbers of internal polls run by both campaigns. Second I need more for symmetry. I need that the internal polls are all unbiased and that they have equal variance. Also I need that the rule for publishing them is the same for both campaigns.
Silver seems inclined to assume, for the sake of argument, that internal polls are generally reputable polls with bias due to publication bias. It seems to me that the key issue is the assumption that each side actually performs the same number (really approximately the same number).
This is clearly not true if a well financed campaign competes with a poorly financed campaign. This means that results including internal polls will depend partly on current public opinion and partly on which campaign has more money. The same is true of election outcomes. I’d consider that a feature not a bug.
My reasoning is used in informal discussions. The fact that a campaign hasn’t released an internal is interpreted as bad news for that campaign. It is assumed that they ran polls and didn’t like the results. Consider what Steve Singiser just typed.
Meanwhile, Louisiana looks competitive for the first time in months, at least according to a Dem poll. As of this afternoon, we haven’t seen the retaliatory GOP internal poll, suggesting that the Dem poll might at least be in the wheelhouse.
No one seems to doubt that this argument is valid, yet it is excluded from Silver’s model and from the more primitive polling averages presented by others.
update 2: Silver’s proposed 6% correction ignores this factor. Now ignoring all internal polls will not introduce bias, but including internal polls with the 6% correction will introduce bias in my example in the case in which only one internal poll is reported.
end update 2.
I think my alleged proof is rigorous, but it is very brief and I will tell stories to explain.
There are four possibilities
1. Both internals give better results for A than the consensus based on independent polls
2. Both internals give better results for B
3. The poll conducted by A gives better results for A and the poll conducted by B gives better results for B
4.The poll conducted by A gives better results for B and the poll conducted by B gives better results for A
In case 4, no internal polls are released so the average including (hypothetical reported internals) is just the average based on independent polls.
In case 3 each internal poll is published. Each is biased (by selection) each is equally biased in opposite directions. The final average is unbiased and based on a larger sample (with a larger denominator) so it has lower variance than the average independent poll.
The interesting cases are 1 and 2. I will explain case 1, since they are symmetric. In case 1 one poll is included in the average. It’s distribution is biased by selection. However, the fact that the Poll run by campaign B gave an lead for A greater than dpublic is not included at all in calculating the average. The two biases exactly cancel. It is sufficient but not necessary to note that the distributions of (internal poll minus average of independent polls) are symmetric around 0. It is necessary that the two internal polls have the same distribution (the bit about equal quality).
update 3: I haven’t shown you the proof of this new claim. It actually adds something to the obvious claim based on symmetry. It means that not only is there no unconditional bias, but also there is no bias conditional on being in case 1 and similarly there is no bias in case 2. This means that Silver’s proposed “first cut” is not the correct formula given Silver’s data. The fact that one campaign is not releasing internal poll results means something and the 6% correction is based on the assumption that it means exactly nothing. I’d say a better first cut is 0%, but I am absolutely sure that a 5% correction is better than a 6% correction (I’m willing to bet a modest amount of money on it using exactly the same data Silver used to calculate the 6%)
end update 3.
Now someone at DailyKos noted (I think it was Steve Slingiser to whom I linked above) that averages including internals performed better than averages of independent polls in past cycles. I claim that I have an explanation of that fact which is perfectly consistent with the very significant bias in publicly reported internals.
Second, my argument for including internals will cease to hold if it is adopted. If it is known that internal polls are believed, the incentive to deliberately bias them increases until the effect of including them becomes a bias in favor of the less honest campaign. No campaign is very honest, but they do differ, so, if indeed they are not cooking the internal polls now, it would be better if people ignored the internal polls at the cost of not knowing so well how elections will turn out.
Not being Kant, I will include internal polls in my averages.
update 4: I haven’t proven any of my mathematical claims after the first (at most) but I will add one more for the patient reader. Even if internal polls are biased (not just reported internal polls but also the numbers which might or might not be biased) they should be included if both campaigns have the same bias and the same symmetric distribution around that biased mean, both know the bias and both report the poll if it is better for their candidate than they expected given the bias.
as in the non proof of the cliams of no bias in cases 1 and 2 works just as well.
A simpler way to put the new claim is that if half of internal polls are reported, and those are the half which are better for the campaign which financed the polling, then the best approach is to include internal polls in the average just as if they were independent. This is true even if internal polls are biased (and it’s not just that published internal polls are biased due to publication bias).
If more than half of internal polls are reported on average, then the simple average UNDERestimates the expected vote share of the campaign which reports the internal poll. The fact that the other campaign didn’t report it’s poll contains lots and lots of information if campaigns usually report their internal polls. More generally for many internal polls run but equal numbers by each campaign this means that my 0% correction for internal polls is too high if most internal polls are reported.
This is always assuming a lot of symmetry, importantly symmetry in the number of internal polls run and symmetric bias (indeed mirror image distributions) and symmetric rules for what to publish.
The focus on polling results is becoming a significant deterrence to a democratic election result. Internal or external polling is subject to the bias of the poll conductor. The media acts as though polling results are god’s words on Earth. The play of the lead held by one side or another influences the voters and results in a self fulfilling analysis. Unfortunately the average voter has the tendencies of a lemming.
I agree with Nate Silver. And your closing paragraph supports Silver’s approach.
MG, makes a great point.
I disbelieve this election cycle polling because the election cycle is different enough from others such that it is difficult to make the correct assumptions to define the “likely voters” mix.
What I have seen is that internal polls spend their money on the selection of targetted callees, and creation of the appropriate questions. Otherwise the polling is actually done by volunteers. The bottom line, therefore, is internal polls may be quite different in both goals and quality from professional/independent polls.
So, Silver is probably more correct than you in ignoring internal polls.
I supports the claim that the world would be a better place if everyone followed Silver’s approach which is the the standard approach. But my post was a post about how one to predict who will win elections and not how we all acting collectively could improve the democratic process if we all followed the categorical imperative.
These are two completely different questions. I will consider internal polls and hope that most other people don’t.
You argue that Silver is incorrect as he claims that internal polls are conducted by reputable pollsters. His argument was that each internal poll is about as good as an independent poll but published internals are selected causing publication bias, and so they should be ignored. That argument is incorrect. I admit I didn’t provide a link but I will now look for a link and quote for an update.
One view might be that otherwise reputable pollsters cook books when hired by campaigns. I think this is unlikely. They have a lot to lose. I don’t think this is your view.
I think your view is that campaigns run polls themselves or by hiring hack firms and those polls should be ignored. I think I agree with you. I don’t actually know of a poll run by campaign staff, so I just interpret you as saying there are some bogus hack pollsters. This is known to be true, but it is a separate issue.
I was working within the assumption that internal polls are like independent polls except for the publication decision (this was assumed by Silver in the post to which I neglected to link). To deal with the real world, I re-interpret (partially retract) my proposal to say pay atttention to a poll financed by a campaign if the pollster also runs independent polls and doesn’t have a clear bias or terrible variance in those independent polls (provided there are about equally many internal polls for Republicans and Democrats run (not necessarily published) by such pollsters).
My guess is that pollsters who deliberately cheat are rare. The scandals are fairly rare (I recall two) and news of deliberately slanting samples would be hot news.
It seems you know more than I do. I think you should describe what you saw and where. I don’t know how to trace CoRev to your name and address, so I think you can consider yourself anonymous and morally obliged to report when and where you witnessed fraud. To be clear, phrasing the question isn’t fraud (the question is published) but any targetting of calls with an aim other than representativeness is fraud. If you have witnessed it as you claim, you should denounce it. If by “what I have seen” you didn’t mean what you have seen, with your eyes, you should be ashamed to even discuss the concept of integrity.
You argue that Silver is incorrect as he claims that internal polls are conducted by reputable pollsters. His argument was that each internal poll is about as good as an independent poll but published internals are selected causing publication bias, and so they should be ignored. That argument is incorrect. I admit I didn’t provide a link but I will now look for a link and quote for an update.
One view might be that otherwise reputable pollsters cook books when hired by campaigns. I think this is unlikely. They have a lot to lose. I don’t think this is your view.
I think your view is that campaigns run polls themselves or by hiring hack firms and those polls should be ignored. I think I agree with you. I don’t actually know of a poll run by campaign staff, so I just interpret you as saying there are some bogus hack pollsters. This is known to be true, but it is a separate issue.
I was working within the assumption that internal polls are like independent polls except for the publication decision (this was assumed by Silver in the post to which I neglected to link). To deal with the real world, I re-interpret (partially retract) my proposal to say pay atttention to a poll financed by a campaign if the pollster also runs independent polls and doesn’t have a clear bias or terrible variance in those independent polls (provided there are about equally many internal polls for Republicans and Democrats run (not necessarily published) by such pollsters).
My guess is that pollsters who deliberately cheat are rare. The scandals are fairly rare (I recall two) and news of deliberately slanting samples would be hot news.
To be clear, in my view, phrasing the question isn’t fraud even if the question is slanted (the question is published) but any targetting of calls with an aim other than representativeness is fraud, and I think it is not often done by well known pollsters even when they are paid by campaigns.
I am going to make absolutely key unsupported assumptions in order to write down a model. Then I will pretend the model is reality.
This, all by itself, qualifies you to be a Chicago School economist.
Cheers!
JzB
Robert said: “I don’t actually know of a poll run by campaign staff, …” I do and have taken part in them, plus numerous surveys. They are used to get a feel for the campaign’s competitive position.
Your also said: “I was working within the assumption that internal polls are like independent polls except for the publication decision …“. From my own experience, your assumption about internal polls is quite incorrect, especially for state level and lower offices and for poorly financed campaigns.
Also saying this: “…but any targetting of calls with an aim other than representativeness is fraud, and I think it is not often done by well known pollsters even when they are paid by campaigns. ” may be a little naive. The target audience is the major issue/problem with polls. Thgt’s why I started my comment with: “I disbelieve this election cycle polling because the election cycle is different enough from others such that it is difficult to make the correct assumptions to define the “likely voters” mix. ” I’m just not sure the pollsters are quite clued in to the “likely voter” mix yet.
Another week will tell.
CLARIFICATION: I agree with Nate Silver’s Senate model (internal polls are excluded). I am not certain how the House model works, as he does include some internal polls. I’m not comfortable with the approach used for House candidates, absent more details and the Silver methodology is unclear on all House model specifics.
Robert’s opening statement is a bit misleading: “Nate Silver, like essentially all election handycappers, ignores internal polls — polls financed by one of the candidates or by the party of one of the candidates.”
Silver stated on October 10, “As of today — with about 25 days remaining in the campaign — our database has some kind of polling (including polls released by campaigns) in 150 House districts.”
http://fivethirtyeight.blogs.nytimes.com/2010/10/10/number-of-competitive-house-races-doubles-from-recent-years/
Silver’s FiveThirtyEight Methodology
http://fivethirtyeight.blogs.nytimes.com/methodology/
CLARIFICATION: I agree with Nate Silver’s Senate model (internal polls are excluded). I am not certain how the House model works, as he does include some internal polls. I’m not comfortable with the approach used for House candidates, absent more details. Silver’s FiveThirtyEight nethodology is unclear on all House model specifics.
Robert’s opening statement is a bit misleading: “Nate Silver, like essentially all election handycappers, ignores internal polls — polls financed by one of the candidates or by the party of one of the candidates.”
Silver stated on October 10, “As of today — with about 25 days remaining in the campaign — our database has some kind of polling (including polls released by campaigns) in 150 House districts.”
http://fivethirtyeight.blogs.nytimes.com/2010/10/10/number-of-competitive-house-races-doubles-from-recent-years/
Silver’s FiveThirtyEight Methodology
http://fivethirtyeight.blogs.nytimes.com/methodology/
CLARIFICATION: I agree with Nate Silver’s Senate model (internal polls are excluded). I am not certain how the House model works, as he includes some internal polls. I’m not comfortable with that approach absent more details. Silver’s FiveThirtyEight methodology is unclear on all specifics of the House model.
Robert’s opening statement deserves further clarification in the main post. Otherwise, it is misleading: “Nate Silver, like essentially all election handycappers, ignores internal polls — polls financed by one of the candidates or by the party of one of the candidates.”
Silver stated on October 10, “As of today — with about 25 days remaining in the campaign — our database has some kind of polling (including polls released by campaigns) in 150 House districts.”
http://fivethirtyeight.blogs.nytimes.com/2010/10/10/number-of-competitive-house-races-doubles-from-recent-years/
Silver’s FiveThirtyEight Methodology
http://fivethirtyeight.blogs.nytimes.com/methodology/
CLARIFICATION: I agree with Nate Silver’s Senate model (internal polls are excluded). I am not certain how the House model works, as he includes some internal polls. I’m not comfortable with that approach absent more details. Silver’s FiveThirtyEight methodology is unclear on all specifics of the House model.
Robert’s opening statement also deserves further clarification in the main post: “Nate Silver, like essentially all election handycappers, ignores internal polls — polls financed by one of the candidates or by the party of one of the candidates.”
Silver stated on October 10, “As of today — with about 25 days remaining in the campaign — our database has some kind of polling (including polls released by campaigns) in 150 House districts.”
http://fivethirtyeight.blogs.nytimes.com/2010/10/10/number-of-competitive-house-races-doubles-from-recent-years/
Silver’s FiveThirtyEight Methodology
http://fivethirtyeight.blogs.nytimes.com/methodology/
CLARIFICATION: I agree with Nate Silver’s Senate model (internal polls are excluded). I am not certain how the House model works, as he includes some internal polls. I’m not comfortable with that approach absent more details. Silver’s FiveThirtyEight methodology is unclear on all specifics of the House model.
Robert’s opening statement also deserves further clarification in the main post: “Nate Silver, like essentially all election handycappers, ignores internal polls — polls financed by one of the candidates or by the party of one of the candidates.”
Silver stated on October 10, “As of today — with about 25 days remaining in the campaign — our database has some kind of polling (including polls released by campaigns) in 150 House districts.”
http://fivethirtyeight.blogs.nytimes.com/2010/10/10/number-of-competitive-house-races-doubles-from-recent-years/
Silver’s FiveThirtyEight Methodology
http://fivethirtyeight.blogs.nytimes.com/methodology/
I agree with Nate Silver’s Senate model (internal polls are excluded). I am not certain how the House model works, as he includes some internal polls. I’m not comfortable with that approach absent more details. Silver’s FiveThirtyEight methodology is unclear on all specifics of the House model.
Robert’s opening statement also deserves further clarification in the main post: “Nate Silver, like essentially all election handycappers, ignores internal polls — polls financed by one of the candidates or by the party of one of the candidates.”
Silver stated on October 10, “As of today — with about 25 days remaining in the campaign — our database has some kind of polling (including polls released by campaigns) in 150 House districts.”
http://fivethirtyeight.blogs.nytimes.com/2010/10/10/number-of-competitive-house-races-doubles-from-recent-years/
Silver’s FiveThirtyEight Methodology
http://fivethirtyeight.blogs.nytimes.com/methodology/
Also to the point raised by me above “The scandals are fairly rare (I recall two)”. The two are Strategic Vision and Research 2000. Seem is an understatement. The evidence amounts, in my view, to proof beyond reasonable doubt.
Notably, Nate Silver included Research 2000 in his calculations until they were caught. They are almost certainly fraudsters, but they were, in Silver’s view, independent. His predictions suddengly changed when they were detected.
A case of an independent pollster which appears to have made up numbers does not support the claim that independent pollsters are of higher quality than internal pollsters. The claim that they made up numbers is based on the facts that their numbers contained patterns typical of numbers people make up when trying to make up numbers which look stochastic and that no other process which generates such patterns has ever been described or detected.
But see those are polls which Silver included, because Research 2000 was an independent pollster. He sure wasn’t a fan of theirs. He advised Markos Moulitsas to fire them (not because he suspected out and out fraud). He also first alleged and basically proved that Strategic Vision (a Republican house poller) was fraudulent, so the current score is 1 independent fraudster and one internal fraudster and essentially no evidence that one group is more reliable than the other.
I don’t consider Zogby to even claim that Zogby interactive is a poll . In any case an absurd methodology is not fraud if you admit it. And Zogby is an independent pollster which would bring the count to two independent bogus pollsters to one internal.
I have enormous respect for Nate Silver. If I didn’t I wouldn’t post every time I disagree with anything he writes.
But I said so Jazz. Saying that this is what I did disqualifies me. All economists make silly models. The Chicago school is defined by rhetorical tricks designed to avoid typing what I typed.
Robert – “I have enormous respect for Nate Silver. If I didn’t I wouldn’t post every time I disagree with anything he writes.”
Apparently not enough to correct your lead paragraph. Silver does not include internals polls in his Senate model. He has explained that.
Robert – “I have enormous respect for Nate Silver. If I didn’t I wouldn’t post every time I disagree with anything he writes.”
Apparently not enough to correct your lead paragraph.
Exactly so. Chicago, among others.