Relevant and even prescient commentary on news, politics and the economy.

Data Integrity Requires Personal Integrity and Vetting

Anyone who has ever worked with data knows that making certain the information is “clean” is much more important than what you do with it. Brilliant analysis of inaccurate data may make heroes (Chamley, Prescott, etc.), but it doesn’t make sensible policy. Witness the release of “public” data from the state of Texas.

Jeremy once commented that the transition from a public to a private university was the choice between everyone knowing your earnings(public data) and everyone knowing your student reviews.

What happens when (1) everyone knows your salary but (2) what they know isn’t true:

“My salary on the spreadsheet is $30,000 higher than my annual contract salary,” a lecturer in the University of Texas system wrote in an e-mail message. He did not want to be named because of his status as a non-tenure-track employee. “Other details are correct, but the number 99 percent of people will want to see is not. The spreadsheet says that it’s me, but it is not me.”

So why would this happen? Because public universities in Texas aren’t allowed to treat their data as if they are private companies:

System officials said the decision to release the data even though it was in draft form and included many gaps resulted from receiving multiple requests for it from news-media outlets under the state’s open-records laws. After noting the data’s shortcomings, in a disclaimer cautioning that the information was “incomplete and has not been fully verified or cross-referenced,” the system released it “in the spirit of openness and transparency” because much of the data was already public anyway, said Anthony P. de Bruyn, director of public affairs.

Except, of course, that the data wasn’t already public. Public data is vetted; this had not been. So why was it released?

Apparently, because no one asked what the legal consequences would be if they made certain it was accurate first:

Thomas Kelley, a spokesman for the Texas attorney general’s office, said in an e-mail that the state’s Public Information Act “applies to records available on the date of request,” even if the university system thought the records being requested were incomplete. Mr. Kelley said system officials had two options: release the data available at the time or request a ruling from the attorney general’s office about whether the incomplete data must be released.

And what was wrong? Well, almost anything:

Mr. de Bruyn said the data, which spans the system’s nine academic campuses, was collected mostly at the institutional level. Yet professors have noticed some mistakes in the data that seem to point to a more-distant process.

For instance, Renee Rubin, an associate professor in the department of language, literacy, and intercultural studies at the University of Texas at Brownsville, said she and her department chair were listed in the wrong department. “So then you begin to wonder what else is wrong,” Ms. Rubin says. [emphasis mine]

Is it more worrisome if Mr. de Bruyn is correct, or if he is not?

I’m think about this more intensely now in part because of the pending end of the publication of the invaluable Statistical Abstract of the United States. As Kieran Healy noted, “When it comes to the United States, the print and online versions of the SA are a peerless source of information for all your bullshit remediation needs.”

Unlike the Texas imbroglio, the Statistical Abstract has a 133 year publication history and well-established reputation for accuracy. Destroying that reputation would have taken only one major incident; will anyone ever trust public data released in Texas again?

But the Obama Administration’s “transparency initiatives” appear to be failing here as well, as an Unforced Error. (paging Brad DeLong) And the consequence will be something that more resembles the Texas debacle than accurate, independent policy analysis.

Unless the Administration considers providing peacemeal, unstandardized information to be a feature, it appears to be a substantive bug.

Tags: , , , , Comments (5) | |

Larger, greater than expected…Declines

by Divorced one like Bush

US Initial jobless claims decline larger than expected in May 31 week (6/5/08)

Mortgage finance giant suffers much larger-than-expected loss due to reserves for credit losses and slashes its dividend to preserve capital. (8/8/08)

Factory orders decline more than expected in August (10/2/08)

Retail sales for September posted its steepest fall in 3 years, down by 1.2%, larger than the expected 0.7% decline…The New York Fed manufacturing index plunged to a record low for October, sharply worse than expectations to -24.62…Meanwhile producer prices climbed higher than forcast…(10/16/08)

The University of Michigan consumer sentiment survey posted its steepest drop on record, collapsing far more than expected to 57.7 in October… (10/18/08)

A larger than expected decline in building approvals reflects continued investor caution due to high interest rates, economists say. (Australia, 9/30/08. )

So, what does it say when we read headlines that start with “Larger than expected decline”? Really.

What world, because it is a world phenomenon as shown with the Australia opening news line, are these people living in such that what they read in the data surprises them? Should we not be concerned that those who are being relied on to manage our economy have not expected what they are now seeing? Reading many articles starting off with that phrase “Larger” or “bigger” or “greater” “than expected decline” does not bode well for all the analysis that has been relied on by the managers (would this include investors?) of our economy. I mean, I posted here in comments some time ago that my flower shop has been seeing a steady decline since August 1996. I noted when the first quarter reports for this year came out, based on my flower shop, that we would see some good numbers, but it would only be a burp of pent up desire to spend and should not be considered positive because of the major drop in April and May (usually one of the largest months for flowers).

Do you remember the commentary on how the rising oil prices would not hurt the economy because it was still “relatively” cheap? Was the talk that if oil got back down to $70/barrel then we would know the pricing was speculation? Well? What, are we not concerned now to know if the oil price rise was speculation or not? Or is this another “larger than expected decline”?

Yet, here we are reading commentary that, if we are honest and real, should be giving us great pause as congress formulates policy. Such commentary of surprise as numbers are reported should be raising questions such as: Are we monitoring relevant data for our purpose? Are our theories of what is significant regarding our economic intent valid? Is the data an accurate description of what is happening in our economy? And, the most basic: Why are we surprised? Why were our expectations wrong? Should we be looking for a different school of economic thought than the one that has dominated? (Some time ago, here in RIland an economist from URI thought that tolls collected for the Newport Bridge was a good indicator for our state economy. He found that he was correct.)

Is now the time to broaden the discussion or are we going to wait for more unexpected surprises?

Tags: , , Comments (0) | |