# Spurious Correlation of the Day

Correlation is not causation, more research and testing is required, etc.

I was working from the concept that home Internet service is a luxury item—or, at the very least, non-essential.* In short, that you would tend to give up home Internet access if the choice is between that and staying current on your mortgage.

Looking at the State-level data, though, produced the following regression:

HomeINetAccess = 0.78736*(FICO>660) – 0.1934*(Pct with Current Payment) – -0.1662*(Lying Broker Loans) + 73.36

R-squared = 0.4311 Adj. R-squared = 0.3956**

Fortunately, only the FICO>600 (t=4.10) and the constant (t=7.77) were clearly significant at the 95% confidence level. (Current t = –1.38, Lying Broker t = -0.95). And it seems intuitively obvious that people with better credit scores are more likely to be able to afford (and demand) home-based Internet access.

Removing the “Lying Broker Loans,” strangely, didn’t change the sign, though it did reduce the perceived effect and lower the base constant.

HomeINetAccess = 0.65055*(FICO>660) – 0.1159*(Pct with Current Payment) + 67.42

R-squared = 0.4204 Adj. R-squared = 0.3967

Fortunately, Pct with Current Payment remains an insignificant variable (t = –1.02); indeed, it becomes even more unlikely.

Curiously, there is one random regression that does appear significant.

HomeINetAccess = 0.43854*(FICO>660) – 0.3099*(Mortgage Originated in 2005 or before) + 80.89

R-squared = 0.429; Adj. R-squared = 0.4057

Here, both variables and the constant appear significant (t=3.5, –3.81, and 14.16, respectively). So we need a story to explain the negative sign, especially since running the same regression against the“Originated in 2006” or “Originated in 2007” values produces a larger R-squared and results with the intuitively-correct sign:

HomeINetAccess = 0.40822*(FICO>660) + 0.5340*(Mortgage Originated in 2006) + 47.62

R-squared = 0.5217; Adj. R-squared = 0.5022; t(FICO>660) = 2.98 t(2006) = 3.41

HomeINetAccess = 0.53598*(FICO>660) + 0.5476*(Mortgage Originated in 2007) + 55.02397

R-squared = 0.5034; Adj. R-squared = 0.5112; t(FICO>660) = 4.63 t(2007) = 3.57

So people who bought at the peak of the bubble, or even when the bubble was beginning to break, are more likely to have Home Internet access than those who have been living in their house for a longer period of time. Indeed, having lived in your house for a longer period of time correlates *negatively*, on a State level, with having Home Internet access.

Were we to speculate, we might guess that people who have been living in their homes longer did not have Internet access easily available and affordable when they bought their home, and have not decided to add it now. (This would imply either that there are major transaction costs associated with gaining Internet access or that the people who bought in the pre-2006 environment are resource-constrained in other ways.)

As a reasonable speculation, people who bought in 2006 and 2007—arguably, the top of the market—have (or believed they have) less price sensitivity than those who bought while the bubble was inflating. This might suggest that the people who were buying in 2006 were more likely to be “trading up” than buying for the first time. There is anecdotal evidence to that effect. Looking at the graphic of U.S. home ownership percentage:

it appears that by 2006, the market consisted more of homeowners and speculators than it did new buyers, but the data I’m using does not have the granularity either to accept or reject that hypothesis.***

In any event, further research appears to be needed—or, maybe, this is just the Spurious Correlation of the Day.

*Jim Henley—and any other parent whose daughter is a Club Penguin devotee (for instance, me)—might disagree.

**Those not in the social sciences will look at these R-squared values and wonder if there is anything being presented. 40% is, I am told, a very good result. Indeed, since the entirety of Real Business Cycle theory is hung on an R-squared close to 0.50, certainly a finger exercise with a result that is only 80% of that would be, if not earth-shattering, then at least publishable.

***Suggestions for sources that might indicate whether buyers were speculators—e.g., state-level data that indicates if property was being purchased to be a primary residence or second (“vacation”) home—might be available are welcome in comments or via e-mail.

I suggest you add age of the family to the regression as older and or retired people wbe less likely to have internet service and will have lived in the house a long time.

Also, the more precarious people feel, might they keep broadband as a replacement for a number of other services? I.e., phone becomes Skype, cable TV becomes broadband TV, newspaper becomes Google news, phone answering machine becomes email, job search becomes net-based, fish tank becomes Youtube…

A logical jump following that would be to drop broadband and get a smartphone. It replaces all those things, plus stereo, GPS etc, and can ride around in your pocket while you’re looking for a job and/or better cardboard box for the night.

Ken,

I’m assuming that the dependent variable (home internet) is a binary limited dependent variable, so the regression wouid be a probit (or perhaps logit) regression rather than a linear probability regression.

Might just be age-sensitivity combined with some other proxy for suburban locations vs highly rural areas or indeed certain kinds of urban areas. Another possibility is that internet access correlates to the ability to get a subprime loan via a specific avenue. Recall the television ads from the last decade about “banks competing to give you a loan” via various websites.

“Fortunately, only the FICO>600 (t=4.10) and the constant (t=7.77) were clearly significant at the 95% confidence level.”

This is off topic and sounds meaner than I would like it to, but this is the second time in a week I’ve run across mention of a 95% significance level on an economics blog. There is no 95% significance level; nothing is significant at 95%. Only a Republican would report a study with a 95% probability of rejecting a true null hypothesis. We know what you’re trying to say, but please don’t fool around with the language this way.

I’m just thinking aloud here, but seems like the early-mid bubble population was one where the majority overpaid for their house, which means a big chunk were stretching the affordability envelope, but those that were overstretched have already foreclosed and don’t count, leaving the borderline cases who have cut every expense besides the mortgage in an attempt to hang on. Those that came in at the most egregious time in the market and are still there could actually afford to overpay, and therefore buy “luxury” goods as well.

I think I would live on hobo beans before I cut my internet.