Unskewing 538

Different data journalists have different estimates of the probability that Hillary Clinton will be elected. The numbers will be updates, so I type the current ones before discussing.

Five Thirty Eight polls plus 55.7%
Five Thirty Eight polls only 57.1 %
Daily Kos 63%
Upshot 75%
Princeton Election Consortium (Sam Wang) (random drift) 71%
(Bayesian) 81%

As usual Nate Silver and Sam Wang are the extremes with Silver estimating probabilities closer to 50%. This happens mostly because Silver estimates distributions of parameters which Wang assumes to be known constants. I have to admit I generally agree with Silver.

Silver explained at least part of the difference with other aggregators

High numbers of undecided and third-party voters are associated with higher volatility and larger polling errors. Put another way, elections are harder to predict when fewer people have made up their minds. Because FiveThirtyEight’s models account for this property, we show a relatively wide range of possible outcomes, giving Trump better odds of winning than most other statistically based models, but also a significant chance of a Clinton landslide if those undecideds break in her favor.

This is a bit reassuring to me. I think there are a lot of #NeverTrump voters who are very unenthusiastic about Clinton. These are voters who say he is unqualified and temperamentally unsuited to be President. I tend to guess many of them will reluctantly vote for Clinton if and only if it seems necessary and otherwise stay home of vote 3rd or 4th party. I remember 2000 (and some of these voters don’t) but I am not as alarmed as I would be without this argument.

The point (if any) of this post is that fivethirtyeight normalizes polls in which only Trump and Clinton are named to the standard of polls in which Johnson & Stein are also named. They will be on the ballot, but this seems to me to be a mistake. Respondents can volunteer that they will vote for another person if asked to choose between Trump and Clinton. I think the pressure due to naming only Trump and Clinton is weaker than the pressure of an upcoming election and fear of wasting a vote. So I’d guess polls which name only 2 candidates give more accurate forecasts. I think this is historically true (sorry no link). Certainly declared support for 3rd party candidates in September polls regularly vastly exceeds actual votes for 3rd party candidates.

I don’t know the fivethirtyeight correction term (sorry I could probably find it there if I looked). My impression is that Clinton averages 1 or 2% better in polls which name only Trump and her. Currently The Huffington Post says 1% nationwide (Clinton 4% ahead in 2 name polls 3% ahead in 3 name polls including Johnson (including Stein has to hurt Clinton)). Given the confidence interval and the fact that all aggregators assume normality, a 1% difference in means corresponds to about the difference between 57% and 75% (this is a very rough BS pseudo calculation).

An even more striking pattern over at The Huffington Post is that the fitted curve for the Clinton Vs Trump Vs Johnson is much smoother than the fitted curve for Clinton-Trump. This is partly due to their smoothing algorithm which smooths more if there are few data points (it is a compromise between don’t want to use very few points and don’t want to use very old data). But eyeballing, I am fairly sure it isn’t just that. Also the moderately smoothed Clinton support in 2 way polls varies more (including conventions roughly 44-48 for Clinton and 40-42 for Trump). I think this shows a lot of the variance is in the willingness of #NeverTrumpers to say they will vote for Clinton if pressed.

So after pychoanalyzing data ananylis, I conclude that the key issue is whether people who think Trump should not be elected, but don’t want to vote for Clinton end up reluctantly voting for Clinton. What an original thought. Bet no one has written that already in 2016.