« Back

A futile quest

Bill Haskell | July 6, 2020 4:52 am

Healthcare

Hot Topics

Politics

A futile quest, Why “performance” measurement is not working, Minnesota Physician The Independent Medical Business Journal, Kip Sullivan, April 2020

Intro: “Pay for Performance” is not a new catch phrase in the healthcare community, but it is one that has seen a recent spike in interest from the general public and healthcare world alike. The renewed interest is due to the Affordable Care Act (ACA) and initiatives within the Act that require hospitals and providers who participate in Medicare to engage in pay for reporting activities that will transition to pay for performance over the course of the next 3-4 years. Kip Sullivan writes on the pay for performance. Kip has also been a proponent of Single Payer and has written many articles of which some have been on AB.

Over the last three decades, Minnesota’s health care policymakers have gotten into a bad habit: They recommend policies without asking whether there is sufficient evidence to implement the policy, and without spelling out how the policy is supposed to work. Measurement and “pay for performance” (P4P) schemes illustrate the problem. Multiple Minnesota commissions, legislators, agencies, and groups have endorsed the notion that it’s possible to measure the cost and quality of doctors, clinics, and hospitals accurately enough to produce results useful to regulators, patients, providers, and insurers.

But these policymakers did so with no explanation of how system-wide measurement was supposed to be done accurately, and without any reference to research demonstrating that accurate system-wide measurement is financially or technically feasible. The Minnesota Health Care Access Commission (in 1991) and the Minnesota Health Care Commission (in 1993) were the first of several commissions to exhibit this “shoot-first, aim-later” mentality. Both commissions recommended the establishment of massive data collection and reporting systems, and both articulated breathtaking expectations of the “report cards” these systems would produce. According to the latter commission, for example, the data collection and number crunching would facilitate “feedback of data that reflects the entire scope of the health care process, from the inputs or structural characteristics of health care to the processes and outcomes of care.” (p. 134) Yet neither commission offered even the crudest details on how such a scheme would be executed nor what it would cost, and, not surprisingly, neither commission offered evidence supporting their high hopes.

In 2008, two other commissions and the Minnesota Legislature exhibited the same casual attitude toward evidence and details. That year, the Legislature, egged on by the commissions, passed a law requiring the Minnesota Department of Health (MDH) to create a “standardized set” of quality measures for Minnesota that would be used to punish and reward “health care providers” (Minnesota Statutes, Section 62U.02). The law offered a few guidelines (such as MDH should “seek to avoid increasing the administrative burden on health care providers”), but it offered no details on how MDH was supposed to create useful measures.

Policymakers at the federal level have exhibited the same attitude. Like the half-dozen commissions that have advised the Minnesota Legislature over the last three decades, the Medicare Payment Advisory Commission (MedPAC) has endorsed measurement and P4P schemes for Medicare on the basis of zero empirical evidence and without working out the details. As the Minnesota Legislature followed the evidence-free recommendations of the Minnesota commissions, so Congress has followed MedPAC’s undocumented recommendations. MedPAC’s influence is most apparent in the Affordable Care Act of 2010 and the 2015 Medicare Access and CHIP Reauthorization Act, which enacted the nation’s largest P4P program (the insanely complex Merit-based Incentive Payment System).

A report card that measures a micro-fraction of all services delivered will be grossly inaccurate.

The proliferation of reporting and P4P schemes has triggered “significant rethinking of measurement activities at the federal government, by national measurement organizations and health care payers, and within state governments,” as MDH put it in a February 2019 report to the Legislature. (p. 8) Minnesota’s Legislature is among those doing some rethinking. It enacted a law in 2017 that requires MDH to develop a “framework” for evaluating MDH’s “performance” measurement program which was authorized by legislation enacted in 2008. Because feedback is useless if it is not accurate, MDH should make accuracy the single most important criterion in evaluating any proposed quality or cost measure. MDH should use this opportunity to explain to the Legislature why MDH’s measurement system and systems like it are grossly inaccurate.

Three impediments to accuracy

The inaccuracy of “performance” measurement has three distinct causes: 1) It measures a tiny fraction of the thousands of services a clinic or hospital delivers (the “bundled product” problem); 2) it is usually very difficult to determine which patient “belongs” to which doctor or clinic (the “attribution” problem); and 3) for all but the simplest of medical services, it is impossible to adjust scores accurately to reflect factors outside physician or hospital control (the “risk-adjustment” problem). I will illustrate each problem with an example, then examine each in more detail.

The “bundled product” problem is the easiest to understand. To illustrate this problem, consider this analogy. Imagine that you want to issue cost and quality report cards on Home Depot, Menard’s, and Lowe’s. For the sake of discussion, let’s say these stores sell ten thousand different items—appliances, tools, construction materials, paint, repair services, plants, etc. You decide your report card will issue grades on just five items—sod, circular saws, tile cleaner, varnish, and dry wall. You ignore the other 9,995 items and services. How useful is your report card?

Like home supply stores, clinics and hospitals sell thousands of services. There are 8,000 services doctors bill for (that’s roughly the number in the Current Procedural Terminology manual, the document all doctors use to find codes to put on their claim forms), and 68,000 diagnoses (that’s the number of diagnoses listed in the current iteration of the International Classification of Diseases maintained by the World Health Organization). MDH currently lists 29 measures on its website.

To illustrate the “attribution problem,” consider again the “optimal diabetes” measure discussed in Part 1 of this two-part series—a measure that Minnesota Community Measurement (MNCM) and many other report-card manufacturers use. This measures the percent of a doctor’s or clinic’s diabetic patients who have their blood sugar and blood pressure under control, who take aspirin and statins, and who don’t smoke. Obviously, the first step in calculating these percentages is to determine which patients “belong” to which clinic. But how do you do that? If you don’t do it accurately, you will be rewarding or punishing doctors for patients they don’t see.

To illustrate the third obstacle to accuracy—inaccurate adjustment of scores to reflect the impact of factors outside physician or hospital control—imagine that you have chosen the blood pressure measure within the “optimal diabetes” measure to be one of a handful of quality measures in your report card. You know that blood pressure is determined by multiple factors doctors have no control over, including patient age, income, education, willingness to exercise, stress levels at home and work, insurance coverage for and the price of blood pressure medications, etc. How do you adjust the scores on your report card to make sure they measure only physician expertise and not all those other factors?

Now imagine how inaccurate your report card is going to be if you can’t solve even one of these problems, never mind all three.

The bundled product problem: Treating to the test

Even the most expansive measurement-and-reporting schemes measure only a tiny fraction of the thousands of medical services sold in modern societies. Consider furthermore that each service can be evaluated at least four ways—by process measures (did the diabetic patient’s A1c levels get measured?), outcome measures (is the diabetic’s A1c level under 8?), structural measures (does the hospital have a catheterization lab?), and patient satisfaction as measured by surveys. The possible number of “quality” measures is in the tens of thousands. Compare tens of thousands to, for example, the 30 or so enforced by MDH and its contractor, MNCM, over the last 15 years.

A common argument presented by proponents of reporting schemes is that scores on some of the handful of measured services increase over time. But measurement proponents never investigate whether improvement on those scores was financed by “treating to the test,” that is, by shifting resources away from patients whose care was not measured. Common sense and a small body of research indicates that’s in fact what happens: the use of a tiny fraction of services that MNCM and other P4P proponents measure has induced teaching to the test. If in fact improvement on a few scores is financed by a worsening of the quality of unmeasured services, overall quality (at both the system and provider level) may not have improved at all. And if patient preferences were bulldozed by providers under pressure to honor the priorities set by report card producers, overall quality may have gotten worse. In either event, a report card that measures a micro-fraction of all services delivered will be a grossly inaccurate reflection of the quality of the providers subjected to measurement.

The attribution problem: Measuring phantom patients

Unlike the bundled product problem, the attribution problem does not afflict all measurements. We know, for example, exactly which hospitals and which surgeons performed bypass surgery on which patients. If we want to prepare a report card on heart surgeons or the hospitals where heart surgery is performed, we don’t have to make up arbitrary, complex rules to assign patients accurately. But we do have to make up arbitrary and complex rules to attribute patients to doctors, clinics, and hospital-clinic chains when the report card measures services like those in the “optimal diabetes” bundle.

Feedback is useless if it is not accurate.

The most widely used attribution rule is to assign patients (without their knowledge) to a clinic or hospital-clinic chain if, during a baseline (or “lookback”) period of one or two years, patients made a plurality of their visits to the clinic or chain. Thus, if I visit Clinic A three times in 2019, Clinic B once, and Clinic C once, the plurality-of-visits rule will “attribute” me to Clinic A for the “performance year” 2020. Even if I never set foot in Clinic A in 2020, the doctors in Clinic A will be rewarded or punished based on my blood pressure, my blood sugar levels, whether I resume smoking in 2020, etc., outcomes they were totally unable to influence during 2020. Health policy analysts and consultants measure the integrity of these attribution algorithms (or the lack thereof) by measuring their “leakage rates”—the rate at which patients fail to seek care often enough during the “performance year” to be assigned to the same clinic the next year. Research on the leakage rates of “accountable care organizations” (groups of clinics and hospitals) and “medical homes” (single clinics), for which the plurality-of-visits method is used, equal an astonishing 30% to 40%. As you can imagine, the addition of all those phantom patients to the denominator of measures like the “optimal diabetes” measures, and the subtraction of so many real patients, substantially augments the noise-to-signal ratio of such measures.

The risk-adjustment problem

The third major contributor of noise to “performance” measures is crude risk adjustment. Risk adjustment is done to adjust scores for factors providers and insurance companies have no control over. The most efficient way to convey the unacceptable inaccuracy of today’s risk adjusters is to review the inaccuracy of the nation’s most widely used, most studied, and probably most accurate risk adjuster—the one CMS developed in the early 2000s to adjust payments to Medicare Advantage plans. This method, known as the Hierarchical Condition Categories (HCC) model, can only predict 12% of the variation in spending among Medicare enrollees. To understand how bad that is, consider these statistics reported by MedPAC: the HCC overestimates spending on the healthiest 20% of beneficiaries by 62% and underestimates spending on the sickest 1% by 21%. MedPAC has made it clear they have no expectation that the HCC can be made substantially more accurate.

As these statistics suggest, inaccurate risk adjustment punishes providers who treat an above-average proportion of the sick and the poor and rewards those who treat an above-average proportion of the healthy and higher-income. This worsening-of-disparities effect can be seen, for example, in the outcomes of the Hospital Readmissions Reduction Program (HRRP), a program foisted on the fee-for-service Medicare program by the Affordable Care Act. The HRRP punishes hospitals with 30-day readmission rates above the national average. CMS uses a risk adjustment method similar to the HCC to adjust readmission rates for factors outside hospital control, but the risk adjuster is so bad it routinely punishes hospitals with sicker patients. Research published in the last three years indicates the HRRP may be killing heart failure and pneumonia patients.

MDH, MNCM, and other “performance measurers” use risk-adjustment schemes that are even cruder than the HCC, and in some cases they use no risk adjustment at all. MDH uses payer mix —the percent of patients insured by Medicaid, Medicare, and private insurers—as its risk adjuster. Unlike CMS, which reports the accuracy rate of its adjuster for at least cost (as opposed to quality), MDH has never reported what percent of the variation its payer-mix method explains. In a 2017 report to the Legislature (https://tinyurl.com/mp-2017-mdh), MDH did concede that “risk adjustment can typically only explain a fraction of differences in quality between providers,” and they knew of no way to improve the accuracy of their crude payer-mix method. But, MDH concluded, that’s OK because the payer-mix method is “reasonable.” (p. 14)

Learning from failure

In its 1993 report to the Legislature, the Minnesota Health Care Commission based its breathtaking expectations of “performance” measurement on this breathtaking assumption: “The commission assumes that the dimensions of health care quality can be defined and measured in a useful and equitable way.”(p 134) The commission endorsed this assumption without even acknowledging the sources of white noise discussed here—the bundled product, attribution, and risk adjustment problems—much less suggesting ways to overcome them. None of the subsequently appointed commissions questioned the 1993 commission’s fanciful assumption. Nor did the Legislature. It’s time Minnesota policymakers admit that that assumption was based solely on groupthink, that the assumption persists to this day because of groupthink, and the assumption must at long last be rejected.

Rejecting that assumption does not mean rejecting measurement. The issue at hand is not whether measurement is useful, but whether inaccurate measurement is useful. Nor does it mean abandoning all efforts to improve the quality of medical services or the health of Minnesotans. It means abandoning the default diagnosis that all problems in our health care system are due to defects in our doctors and hospitals, entertaining the possibility that those problems that might be within provider control are due to insufficient resources, and abandoning the comforting myth that it’s possible to adjust “performance” scores accurately to reflect factors outside provider control. Above all, it means accepting the obligation to ensure that measurements are accurate before they are unleashed on Minnesota’s doctors and hospitals.

Kip Sullivan, JD, is a member of the Health Care for All Minnesota Advisory Board. He was a member of Gov. Perpich’s Health Plan Regulatory Reform Commission. His articles have appeared in the New England Journal of Medicine, Health Affairs, and other peer-reviewed journals.

16 Comments

anne says:

July 6, 2020 at 9:05 am

Kip Sullivan is excellent.

The problem with standard performance measurement as a means of lowering health care costs while supposedly raising care quality, is that patients differ and immediately performance standards will be more difficult to meet the more difficult a patient is to care for. There will be then an incentive to recruit and care for patients who have fewer needs. This will also be necessarily discriminatory by class and ethnicity.
anne says:

July 6, 2020 at 11:05 am

Is it Impossible to Envision a World Without Patent Monopolies?

July 6, 2020

Is it Impossible to Envision a World Without Patent Monopolies?
By Dean Baker

Apparently at the New York Times the answer is no. Elisabeth Rosenthal, who is a very insightful writer on health care issues, had a column * this morning warning that we may face very high prices for a coronavirus vaccine. She points out that this is in spite of the fact that the government is paying for much of the cost of the research. Rosenthal then argues we should adopt a system of price controls or negotiations, as is done in every other wealthy country.

While her points are all well-taken, the amazing part is that she never considers the simplest solution, just don’t give the companies patent monopolies in the first place. The story here is the government is paying for most of the research upfront. Why does it have to pay for it a second time by giving the companies patent monopolies.

There is no reason that the government can’t simply make it a condition of the funding that all research findings are fully open and that any patents will be in the public domain so that any vaccines will be available as a cheap generic from the day it comes on the market. Not only does this ensure that a vaccine will be affordable, it will likely mean more rapid progress, since all researchers will be able to immediately learn from the success or failures of other researchers.

It is amazing that this obvious route is not being considered in public debate. Government-granted patent and copyright monopolies are one of the main ways in which we generate inequality. Bill Gates would still be working for a living without them.

At a time when the country is newly focused on racial inequality, it is striking that reducing the importance of the factors that generate inequality in the first place is not even up for discussion. This is fitting with the good old “White Savior” theory of politics.

Rather than changing the government-created structures that generate inequality, they would rather have the beneficent government push policies that reverse some of the inequality government structures created in the first place. I suppose this route is more appealing to the liberal psyche, but it ignores economic reality, and also at the end of the day, is likely to be less effective politically.

* https://www.nytimes.com/2020/07/06/opinion/coronavirus-vaccine-cost.html
anne says:

July 6, 2020 at 1:47 pm

The Problem of Medical Supplies Was a Stockpile Problem, not a China Supply Chain Problem

July 6, 2020

The Problem of Medical Supplies Was a Stockpile Problem, not a China Supply Chain Problem
By DEAN BAKER

As Donald Trump tries to whip up anti-Chinese sentiment as part of his re-election campaign, many have joined in by complaining about our Chinese supply lines. The New York Times joined this effort by highlighting * the extent to which the U.S. and the rest of the world relies on China for medical supplies.

While it is arguable whether the reliance on China for medical supplies or anything else is a big problem, that was not the main issue when it came to shortfalls of protective gear and other medical supplies at the start of the pandemic. Those shortages stemmed from the government’s failure to maintain adequate stockpiles of essential equipment.

Even if all this equipment was produced in the United States, our factories could not immediately ramp up production by a factor or five to meet the demand the country was facing in March and April. The only way this demand could have been met quickly was if the stockpiles were already in place. This was the major failure in the crisis and it obscures the issue to complain about supply chains going through China.

As a practical matter, it is striking how little production has been disrupted in response to a once in a century pandemic and totally incompetent leadership at the national level. That would seem to imply that our supply chains are not a big problem.

* https://www.nytimes.com/2020/07/05/business/china-medical-supplies.html
- run75441 says:
  
  July 6, 2020 at 2:36 pm
  
  anne:
  
  Since supply chain is my bailiwick and I was really successful at it, I would say you carry the inventory in raw material and not finish goods (which automotive does consistently and has annual sales to dump inventory). It is more expensive to carry finish goods in any quantity. Some of this stuff has shelf life issues also.
  
  Dean is making a broad based assumption on capacity not being available. How many shifts were they running? What is the shift and machine capacity? The supply chain stretches to China, three weeks on the ocean and one week on the docks and in customs on both sides.
  
  You could always fly it over and cut it down to two weeks. Expensive maneuver but a credibility builder for the company. Most of the masks I see are made by China based companies not US owned.
JaneE says:

July 6, 2020 at 1:51 pm

I am not sure that reasonably accurate performance measuring is even possible without universal coverage and probably single payer, or single primary payer for everyone. One reason the British Covid-19 studies are so highly regarded is the fact that they have enough people in them to validate the results, courtesy of the national health system.

So long as part of the population doesn’t even have routine access to health care, the numbers for almost every condition will be distorted. The differing coverage levels for those that do have access will still distort the treatments. Is it the fault of the doctor that the patient cannot have access to the best drug, because it is off formulary?
- run75441 says:
  
  July 6, 2020 at 8:48 pm
  
  Hi Jane:
  
  Medicare and the VA both have a formulary which they follow. Medicare is starting to use VA pricing for its drugs also. Medicare and the VA will approve drugs for its patients. What you are discussing is a problem with Commercial healthcare and Pharma Insurance. My two cents. Let me know if this answers you and I will look further. Just to add to this.
  
  The problem today is there is no balance on pricing with regard to costs.
  
  – Hospitals over a certain period of time (this is on AB) were able to increase pricing 60% none of which was due to cost.
  
  – The ICER selected 100 drugs. of the 100 they began to review those drugs which had an increase in price of twice medical CPI. Within the 77, they chose 10 of which to review extensively. Two of the drugs had sustainable reasons (without and extensive review) for a price increase, the rest did not.
  
  – The latest reasoning is about Gilead’s Remsevdir being set at a price of ~$3,000 for 10 doses. The ICER said a reasonable price would be ~$5000 for 10 doses. Given there is not cure, do either prices appear to be reasonable?
  
  – Henry Ford Hospitals ACO completed an observational study of HCQ, HCQ and AZT, and AZT alone with ~2400 patients over an approximate 3 month period. Using similar dosage as what was used in France and Korea, and timing of treat pre-5th day, their results of mimicked was was reported in Korea and France. In other words it worked. HCQ is far less costly.
  
  The bottom line is we are paying way to much for performance and little is being done within today’s medical environment to bring it under control pre-single payer for which Kip Sullivan is an advocate of today. Google him and my HCQ post is fairly recent and you should be able to find it by flipping back a few pages.
anne says:

July 6, 2020 at 1:54 pm

So long as part of the population doesn’t even have routine access to health care, the numbers for almost every condition will be distorted….

[ Really important. ]
Bert Schlitz says:

July 6, 2020 at 3:15 pm

China doent count that much of the medical supply ratio. The U.S. makes a certain amount of product and then rations it. Trump did not do that, especially after the Ebola outbreak. Trump also didn’t use the defense act which is the real key behind materials shortage. Baker doesn’t seem to get the difference. The lines are always there. The money and orders were not.

There is no anti-China Trump to be found. His money flows prove that.
anne says:

July 6, 2020 at 4:07 pm

July 6, 2020

Coronavirus

US

Cases ( 3,013,072)
Deaths ( 132,753)
anne says:

July 6, 2020 at 5:52 pm

ICE is telling international students on F-1 and M-1 visas that if their school is doing online-only courses they must leave the country or transfer to a place with in-person instruction—or they'll be deemed in the US illegally and subject to deportation. https://t.co/O0T8QITNKG

— Sahil Kapur (@sahilkapur) July 6, 2020

Sahil Kapur @sahilkapur

ICE is telling international students on F-1 and M-1 visas that if their school is doing online-only courses they must leave the country or transfer to a place with in-person instruction—or they’ll be deemed in the US illegally and subject to deportation.

https://www.ice.gov/news/releases/sevp-modifies-temporary-exemptions-nonimmigrant-students-taking-online-courses-during

SEVP modifies temporary exemptions for nonimmigrant students taking online courses during fall 2020 semester

The Student and Exchange Visitor Program (SEVP) announced modifications Monday to temporary exemptions for nonimmigrant students taking online classes due to the pandemic for the fall 2020 semester. The U.S. Department of Homeland Security plans to publish the procedures and responsibilities in the Federal Register as a Temporary Final Rule.

2:55 PM · Jul 6, 2020
anne says:

July 6, 2020 at 7:06 pm

July 6, 2020

Coronavirus

US

Cases ( 3,028,126)
Deaths ( 132,850)

India

Cases ( 720,346)
Deaths ( 20,174)

UK

Cases ( 285,768)
Deaths ( 44,236)

Mexico

Cases ( 256,848)
Deaths ( 30,639)

Germany

Cases ( 198,057)
Deaths ( 9,092)

Canada

Cases ( 105,934)
Deaths ( 8,693)

Sweden

Cases ( 73,061)
Deaths ( 5,433)

China

Cases ( 83,557)
Deaths ( 4,634)
- run75441 says:
  
  July 6, 2020 at 8:43 pm
  
  Hi anne:
  
  Normally on my posts I would let you post information on healthcare which may not be related to the topic of the post. This is a guest poster who I am trying to recruit to post on Single Payer here also. I am going to ask if you can please keep to the topic he is writing about. I really enjoy having you here as you are a wealth of information which is very necessary to present. If you have anything special, I would be happy to post it for you on AB also.
  
  Thank you anne!
  
  Bill
anne says:

July 6, 2020 at 8:52 pm

Ah, sorry; do erase the posts since obviously the post from Kip Sullivan is especially important.
- run75441 says:
  
  July 6, 2020 at 8:55 pm
  
  Ann:
  
  Its ok. Thank you though.
anne says:

July 7, 2020 at 11:53 am

Thank you, but my foolish posts deflected from or obscured the important “Pay for Performance” matter presented by Kip Sullivan.
reason says:

July 8, 2020 at 5:05 am

I think the problem is stated succinctly is that “in order to manage something, we have to measure it” translates to “we only manage what we can measure”. The whole theory of management is impractical and has caused enormous problems, because it narrows focus to things that often don’t really matter.