« Back

Unskewing 538

Robert Waldmann | September 21, 2016 11:32 am

Journalism

Politics

Different data journalists have different estimates of the probability that Hillary Clinton will be elected. The numbers will be updates, so I type the current ones before discussing.

Five Thirty Eight polls plus 55.7%
Five Thirty Eight polls only 57.1 %
Daily Kos 63%
Upshot 75%
Princeton Election Consortium (Sam Wang) (random drift) 71%
(Bayesian) 81%

As usual Nate Silver and Sam Wang are the extremes with Silver estimating probabilities closer to 50%. This happens mostly because Silver estimates distributions of parameters which Wang assumes to be known constants. I have to admit I generally agree with Silver.

Silver explained at least part of the difference with other aggregators

High numbers of undecided and third-party voters are associated with higher volatility and larger polling errors. Put another way, elections are harder to predict when fewer people have made up their minds. Because FiveThirtyEight’s models account for this property, we show a relatively wide range of possible outcomes, giving Trump better odds of winning than most other statistically based models, but also a significant chance of a Clinton landslide if those undecideds break in her favor.

This is a bit reassuring to me. I think there are a lot of #NeverTrump voters who are very unenthusiastic about Clinton. These are voters who say he is unqualified and temperamentally unsuited to be President. I tend to guess many of them will reluctantly vote for Clinton if and only if it seems necessary and otherwise stay home of vote 3rd or 4th party. I remember 2000 (and some of these voters don’t) but I am not as alarmed as I would be without this argument.

The point (if any) of this post is that fivethirtyeight normalizes polls in which only Trump and Clinton are named to the standard of polls in which Johnson & Stein are also named. They will be on the ballot, but this seems to me to be a mistake. Respondents can volunteer that they will vote for another person if asked to choose between Trump and Clinton. I think the pressure due to naming only Trump and Clinton is weaker than the pressure of an upcoming election and fear of wasting a vote. So I’d guess polls which name only 2 candidates give more accurate forecasts. I think this is historically true (sorry no link). Certainly declared support for 3rd party candidates in September polls regularly vastly exceeds actual votes for 3rd party candidates.

I don’t know the fivethirtyeight correction term (sorry I could probably find it there if I looked). My impression is that Clinton averages 1 or 2% better in polls which name only Trump and her. Currently The Huffington Post says 1% nationwide (Clinton 4% ahead in 2 name polls 3% ahead in 3 name polls including Johnson (including Stein has to hurt Clinton)). Given the confidence interval and the fact that all aggregators assume normality, a 1% difference in means corresponds to about the difference between 57% and 75% (this is a very rough BS pseudo calculation).

An even more striking pattern over at The Huffington Post is that the fitted curve for the Clinton Vs Trump Vs Johnson is much smoother than the fitted curve for Clinton-Trump. This is partly due to their smoothing algorithm which smooths more if there are few data points (it is a compromise between don’t want to use very few points and don’t want to use very old data). But eyeballing, I am fairly sure it isn’t just that. Also the moderately smoothed Clinton support in 2 way polls varies more (including conventions roughly 44-48 for Clinton and 40-42 for Trump). I think this shows a lot of the variance is in the willingness of #NeverTrumpers to say they will vote for Clinton if pressed.

So after pychoanalyzing data ananylis, I conclude that the key issue is whether people who think Trump should not be elected, but don’t want to vote for Clinton end up reluctantly voting for Clinton. What an original thought. Bet no one has written that already in 2016.

5 Comments

Bruce Webb says:

September 21, 2016 at 1:23 pm

I have some issues with 538, one which Silver acknowledges himself (though don’t blame him for my framing).

In short his methodology assumes there is no inherent stickiness when results move to the edge of the previous margin. For statistical purposes it is as if the world is new again. Now these ARE his words (emphasis mine):

Our various models differ on this question. Polls-only assumes that there’s still a lot of uncertainty about the outcome. But it also mostly assumes that the current condition of the race — Clinton ahead by around points — is a statistically unbiased prediction of the Nov. 8 outcome. In other words, it assumes that Clinton is as likely to continue losing ground as opposed to regaining ground from this point forward.

Polls-plus, by contrast, discounts short-term shifts in the polls by hedging them with an index based on economic conditions.

Opposed to both would be a model that builds in support ceilings and floors, or at least assumes some stickiness for results to that side of the distribution. Nate explicitly puts that aside.
bob says:

September 21, 2016 at 1:31 pm

“I tend to guess many of them will reluctantly vote for Clinton if and only if it seems necessary and otherwise stay home [or] vote 3rd or 4th party.”

You state my position exactly. I REALLY do not want to vote for Clinton. Since I live in a deep blue state (Illinois) it seems highly unlikely that I would need to – she would need to screw up her campaign even more than she has already done to make it close here. IF just before the election the race in Illinois is so close that a Donald victory seems possible, I would hold my nose and vote Clinton – though such a bizarre situation could only mean that she had irretrievably screwed up her campaign nationwide.
Otherwise, I will vote for Jill Stein, to send a message (that will be ignored by the Democratic powers-that-be, of course). I’d prefer to write in Bernie, but since he isn’t a declared write-in candidate, in Illinois that write-in wouldn’t be counted or even recorded.
Eric377 says:

September 21, 2016 at 4:19 pm

So long as a method keeps it between 10% and 90%, who later is going to say that it was a bad methodology regardless of who wins? We all know that even longer odds come through now and then. It would be a much more interesting discussion of methodology if they used it to predict the popular vote for example. That’s why most of the handle in sports betting is not on who wins but against the spread or in the case of horses your payout is scaled even though the bet is just positional.
William Ryan says:

September 22, 2016 at 10:20 am

I don’t know if you folks read anything else but recently read there has been mentioned that Google has visited the white house over 417 times in the past few years. Why? How much power and control has the government given Google? Some claim that the folks actually writing and controlling the algorithms can skew data by controlling-prejudicing the algorithm inputs-outputs? Is there any proof or factual evidence that Google is now running the country? Can they sway voters attitudes, opinions and outcomes? Who knows or cares? What “other” uses does the government use Google for other than military use?
ilsm says:

September 24, 2016 at 7:55 am

Lest you forget the never Clinton voters, and the never planned parenthooders.

I will vote for no democrat with Clinton on the ticket.