Does fivethirtyeight process the numbers too much ?

Or not enough. Nate Silver has long performed fairly complicated calculations with (lots of) raw data. Even back when he was Poblano at DailyKos this was highly controversial. Now that he leads a huge team at fivethirtyeight.com, it is almost necessary just to decide whether to trust them, because it is very time consuming to read their explanations of their algorithms.

In particular, obsessive poll watchers such as your humble correspondent, have noted that Senate ratings by fivethirtyeight.com and realclearpolitics.com don’t always move together. Very often the difference between the complicated calculations at fivethirtyeight.com and the simple averages at realclearpolitics.com is greater than the change of either from week to week.

One very important (and controversial) aspect of the fivethirtyeight.com approach is correcting raw polls

 

for house effects. The corrected number which goes into the smoother used to make forecasts is a prediction of what a random pollster would report (more likely to be selected if they have performed well in the past). This means that the calculation is first an estimate of a (quality rating weighted) central tendency of polls based on the observation. Then this is averaged (with slightly more weight on more recnt polls) to give the forecast.

The justification is *not* based on the claim that the (weighted) average poll is unbiased, but rather that such bias can’t be estimated. The report probabilities relatively close to 50-50 (compared at least to bug eating non welcher Sam Wang) because they allow a random term for the unknown common bias. Rather the claim is that removing house effects can be done and reduces the noise in the smoothed average.

I think the way to test such a claim is to try to predict not the outcome (which is occurs only every other year) but the next poll (which is biased only because of the unmeasurable common bias).

My impression is that they are too cautious when removing house effects — that the corrected numbers deviate from the others in the same direction as the uncorrected numbers. So Rasmussen reports polls which (compared to the others and not the unknown truth) are biased Republican. The corrected Rasmussen numbers fivethirtyeight uses also tend to look unusually good for Republicans. This would happen if reluctance to process the numbers too much lead to smaller corrections than would be optimal (that would minimize forecast error).

 

 

 

To test this impression, I took the uncorrected and corrected Democratic leads from generic ballot polls with data from July 1 2018 on (I only went back that far because I was doing this by hand and got bored).

There is one very interesting result

Number of obs = 134
R-squared = 0.0589
Root MSE = 2.725

corrected | Coef. t-statistic
correction | -0.473 (-2.87)
_cons | 8.448 (31.12)

Corrected is the corrected estimate of the Democrats’ lead used by fivethirtyeight to forecast outcomes . correction is that number minus the raw number reported by the pollster.

There is a statistically significant correlation between the corrected number and the correction. A larger correction (1.4733886) times as large would minimize the variance of the more_corrected numbers.

This correction to the correction remains statistically significant also when a lowess smoothed (with parameter 0.8) series (pcorrected) is included. Rather strikingly (but not to poll obsessives) pcorrected has varied very little — there is no noticeable trend in the polls).

The dependent variable corrected poll

Coef. t-statistic

-0.448 (-2.71) correction
1.288 ( 1.40) pcorrected
-2.001 (-0.27) constant

The main expainable part of the variance in the corrected polls is the under-correction for house effects.

OK so it’s time to admit that I talked about forecasting the next poll. I look at the next (raw) poll in the sample ldraw.

The dependent variable next raw uncorrected poll

Number of obs = 133
R-squared = 0.0414
Root MSE = 3.4038

Coef. t-statistic
ldraw | Coef. t
0.487 ( 2.30) correction
-0.001 (-0.01) corrected
6.833 ( 6.91) constant

Now that is weird. The idea was that in addition to a coefficient close to one on the lagged corrected poll, there would be a postiive coefficient on the correction. This would happen if the polls were corrected too little (to little of a good thing). Instead there is a significant coefficient on the correction (corrected too little) and an almost exactly zero coefficient on the lagged poll (as would happen if there were no true change in voting intentions).

In any case, there is more statistically evidence that the polls aren’t corrected enough by fivethirtyeight.com

update I’ve added more polls (all the generic ballot polls at fivethirtyeight.com except I did it by hand and accidentally deleted 2 or 3.)

Dependent Variable Corrected Poll

number of observations 499

-0.342 (-4.23) correction
7.882 (51.65) constant

The evidence of undercorrection is much stronger. Now the point estimate is that the correct correction is four thirds that done by fivethirty eight (with the smaller sample of more recent polls the estimate was three halves)

now first differences. A regression of the change in the corrected poll on the change in the correction.

Dependent variable change in the corrected poll

Coeff t-statistic

-0.311 (-4.23) change in correction
-0.002 (-0.01) constant

This is a fairly dramatic result. If five thirty eight used the correct corrections, then the best estimate of the next corrected poll based on the current corrected poll and the identities of the polling firms would be just equal to the current corrected poll (this is a general property of best forecasts). It isn’t. It is possible to predict a non zero expected change in fivethirty eights forecast based on the identities of the two polling firms.

This should not be possible. It would not be possible if the fivethirtyeight formula were optimal.

Now they have changed their formula fairly recently They found that with an older approach, their estimates of the Democrats lead were mean reverting. This means that they weren’t smoothing enough to be optimal, so they switched to putting more weight on older polls (more smoothing).

There is similarly strong evidence that they aren’t correcting enough for house effects.

The t-statistic of -4.23 is statistically significant evidence that the current fivethirtyeight formula is not optimal.