don’t expect too much…

Posted on Tuesday 10 January 2012

This one was just published in the American Journal of Psychiatry by the designer of the DSM-5 Field Trials et al [I’ve tried to separate the chaff, but it’s still long]. It’s message is, "don’t expect too much from the DSM-5 Field Trials". They must be getting worried about something:
DSM-5: How Reliable Is Reliable Enough?
by Helena Chmura Kraemer, David J. Kupfer, Diana E. Clarke, William E. Narrow, and Darrel A. Regier
American Journal of Psychiatry 2012 169:13-15.

… We previously commented in these pages on the need for field trials. Our purpose here is to set out realistic expectations concerning that assessment. In setting those expectations, one contentious issue is whether it is important that the prevalence for diagnoses based on proposed criteria for DSM-5 match the prevalence for the corresponding DSM-IV diagnoses. However, to require that the prevalence remain unchanged is to require that any existing difference between true and DSM-IV prevalence be reproduced in DSM-5. Any effort to improve the sensitivity of DSM-IV criteria will result in higher prevalence rates, and any effort to improve the specificity of DSM-IV criteria will result in lower prevalence rates. Thus, there are no specific expectations about the prevalence of disorders in DSM-5. The evaluations primarily address reliability.

… Reliability will be assessed using the intraclass kappa coefficient κI. For a categorical diagnosis with prevalence P, among subjects with an initial positive diagnosis, the probability of a second positive diagnosis is κI+P[1–κI], and among the remaining, it is P[1–κI]. The difference between these probabilities is κI. Thus κI=0 means that the first diagnosis has no predictive value for a second diagnosis, and κI=1 means that the first diagnosis is perfectly predictive of a second diagnosis

Reliability is essentially a signal-to-noise ratio indicator. In diagnosis, there are two major sources of “noise”: the inconsistency of expression of the diagnostic criteria by patients and the application of those criteria by the clinicians. It is all too easy to exaggerate reliability by removing some of that noise by design. Instead of a representative sample, as in DSM-5 field trials, one might select “case subjects” who are unequivocally symptomatic and “control subjects” who are unequivocally asymptomatic, omitting the ambiguous middle of the population for whom diagnostic errors are the most common and most costly. That approach would hide much of the patient-generated noise

… It is unrealistic to expect that the quality of psychiatric diagnoses can be much greater than that of diagnoses in other areas of medicine, where diagnoses are largely based on evidence that can be directly observed. Psychiatric diagnoses continue to be based on inferences derived from patient self-reports or observations of patient behavior. Nevertheless, we propose that the standard of evaluation of the test-retest reliability of DSM-5 be consistent with what is known about the reliability of diagnoses in other areas of medicine.

From these results, to see a κIfor a DSM-5 diagnosis above 0.8 would be almost miraculous; to see κIbetween 0.6 and 0.8 would be cause for celebration. A realistic goal is κIbetween 0.4 and 0.6, while κIbetween 0.2 and 0.4 would be acceptable. We expect that the reliability [intraclass correlation coefficient] of DSM-5 dimensional measures will be larger, and we will aim for between 0.6 and 0.8 and accept between 0.4 and 0.6. The validity criteria in each case mirror those for reliability.

… The Lancet once described the evaluation of medical diagnostic tests as “the backwoods of medical research,” pointing out that many books and articles have been written on the methods of evaluation of medical treatments, but little attention has been paid to the evaluation of the quality of diagnoses. Only recently has there been attention to standards for assessing diagnostic quality. Yet the impact of diagnostic quality on the quality and costs of patient care is great. Many medical diagnoses go into common use without any evaluation, and many believe that the rates of reliability and validity of diagnoses in other areas of medicine are much higher than they are. Indeed, psychiatry is the exception in that we have paid considerable attention to the reliability of our diagnoses. It is important that our expectations of DSM-5 diagnoses be viewed in the context of what is known about the reliability and validity of diagnoses throughout medicine and not be set unrealistically high, exceeding the standards that pertain to the rest of medicine…
Cohen’s Kappa Coefficient [κI] is used to measure inter-rater concordance eg two evaluators seeing the same case independently. Perfect agreement results in κI = 1.0. The concordance expected by chance alone results in κI = 0. So the κI values they’re talking about are between 0 and 1.0, the higher the κI, the the better the interrater reliability. Why are they going to all this trouble to tell us not to set our sites too high? Why are the telling us that the comparative prevalence between the DSM-IV and DSM-5 doesn’t matter? Here’s what Dr. Allen Frances has to say about that:
Two Fallacies Invalidate The DSM 5 Field Trials
Psychiatric Times
by Allen Frances
January 9, 2012

The designer of the DSM 5 Field Trials has just written a telling commentary in the American Journal of Psychiatry. She makes two very basic errors that reveal the fundamental worthlessness of these Field Trials and their inability to provide any information that will be useful for DSM 5 decision making.

[1] The commentary states: “A realistic goal is a kappa between 0.4 and 0.6, while a kappa between 0.2 and 0.4 would be acceptable.” This is simply incorrect and flies in the face of all traditional standards of what is considered ‘acceptable’ diagnostic agreement among clinicians. Clearly, the commentary is attempting to greatly lower our expectations about the levels of reliability that were achieved in the field trials- to soften us up to the likely bad news that the DSM 5 proposals are unreliable. Unable to clear the historic bar of reasonable reliability, it appears that DSM 5 is choosing to drastically lower that bar – what was previously seen as clearly unacceptable is now being accepted. Kappa is a statistic that measures agreement among raters, corrected for chance agreement. Historically, kappas above 0.8 are considered good, above 0.6 fair, and under 0.6 poor. Before this AJP commentary, no one has ever felt comfortable endorsing kappas so low as 0.2-0.4. As a comparison, the personality section in DSM III was widely derided when its kappas were around 0.5. A kappa between 0.2-0.4 comes dangerously close to no agreement. ‘Accepting’ such low levels is a blatant fudge factor – lowering standards in this drastic way cheapens the currency of diagnosis and defeats the whole purpose of providing diagnostic criteria. Why does this matter? Good reliability does not guarantee validity or utility- human beings often agree very well on things that are dead wrong. But poor reliability is a certain sign of very deep trouble. If mental health clinicians cannot agree on a diagnosis, it is essentially worthless. The low reliability of DSM 5 presaged in the AJP commentary confirms fears that its criteria sets are so ambiguously written and difficult to interpret that they will be a serious obstacle to clinical practice and research. We will be returning to the wild west of idiosyncratic diagnostic practice that was the bane of psychiatry before DSM III..

[2] The commentary also states: “one contentious issue is whether it is important that the prevalence for diagnoses based on proposed criteria for DSM-5 match the prevalence for the corresponding DSM-IV diagnoses” …. “to require that the prevalence remain unchanged is to require that any existing difference between true and DSM-IV prevalence be reproduced in DSM-5. Any effort to improve the sensitivity of DSM-IV criteria will result in higher prevalence rates, and any effort to improve the specificity of DSM-IV criteria will result in lower prevalence rates. Thus, there are no specific expectations about the prevalence of disorders in DSM-5.” This is also a fudge. For completely unexplained and puzzling reasons, the DSM 5 field trials failed to measure the impact of its proposals on rates of disorder. These quotes in the commentary are an attempt to justify this fatal flaw in design. The contention is that we have no way of knowing what true rates of a given diagnosis should be – so why bother to measure what will be the likely impact on rates of the DSM 5 proposals. If rates double under DSM 5, the assumption will be that it is picking up previous false negatives with no need to worry about the risks of creating an army of new false positives…

The DSM 5 proposals will uniformly increase rates, sometimes dramatically. Not to have measured by how much is unfathomable and irresponsible. The new diagnoses suggested for DSM 5 will [mis]label people at the very populous boundary with normality… The field trial developers seem either unaware or insensitive to the unacceptable risks involved in creating large numbers of false positive, pseudo-patients. Indeed, quite contrary to the blithe assertions put forward in the commentary, we should have rigorous expectations about prevalence changes triggered by any DSM revision… We have known since they were first posted that none of the DSM 5 proposals comes remotely close to meeting a minimal standard for accuracy and safety. And now, the AJP commentary seems to be softening us up for the bad news that their reliability is also lousy…

Given our country’s current binge of loose diagnostic and medication practice [particularly by the primary care physicians who do most of the prescribing], DSM 5 should not be in the business of casually raising rates and offering inviting new targets for aggressive drug marketing. Instead, DSM 5 should be working in the opposite direction – taking steps to increase the precision and specificity of its diagnostic criteria. And the texts describing each disorder should contain a new section warning about the risks of overdiagnosis and ways of avoiding it… They started off on the wrong foot by asking the wrong question – focusing only on reliability and completely ignoring prevalence… And now we get a broad hint that the reliabilities, when they are finally reported, will be disastrously low.. What should be done now as DSM 5 enters its depressing endgame? There really is no rational choice except to drop the many unsupportable DSM 5 proposals and to dramatically improve the imprecise writing that plagues most of the DSM 5 criteria sets.
To summarize, the Field Trials will not give us prevalence data to compare with the DSM-IV and find out how much the "loosening" of the DSM-5 criteria inflates the various categories. And the gist of this American Journal of Psychiatry article is that they are preparing us for low inter-rater reliability from the Field Trials – in the range of lousy by Dr. Frances reckoning. I expect that people align into two camps – the Dr._Frances_is_being_too_picky camp, and the DSM-5_is_a train_wreck camp. I’m in the DSM-5_is_a train_wreck camp, independently – though every time I run down one of Frances’ criticisms, he’s hit the target dead center. He just moves me further [and he’s certainly more informed]. I don’t think my complaint is the same as my DSM-III complaint. I’m kind of resigned to the basic flaws that I expressed the other day [dsmanything].

From my perspective, the DSM-5 was flawed from its inception. The leaders Dr. David Kupfer and Dr. Darrel Regier, are ideologues, clinical neuroscience ideologues. Everything they write has that recent advances in neuroscience, neuroimaging, genomics... kind of talk, as if those recent advances are the reason for the revision in the first place. I don’t know what those recent advances actually were. I don’t think that they know either because they put out one of those we had hoped… articles [ in listening…, class action in the air…]- mirrored by Jefferey Lieberman [Psychiatric Diagnosis in the Lab: How Far Off Are We?] – that says that their dreams of biomarkers hadn’t yet come to fruition, but hope springs eternal [in other words, no recent advances as planned]. They want to stick the word biological in the definition of mental illnesses. The psychologists are appropriately upset that this whole enterprise of biologizing the DSM-5 further. Me too. We’ve known that the descriptive moniker has been a front for biological from the start [see the Tenets of the neo-Kraepelinian approach in need not apply…], but at least they’ve paid lip service to the descriptive idea previously. Not Drs. Kupfer and Regier who are representatives of the ruling class of clinical neuroscientists who have brought us the corruption prone age of psychopharmacology and pharmaceutical collusion. They may not be big-time sinners, but they’ve run with the wrong crowd, and they only speak the lingo of those streets.

My second complaint is closer to that of Dr. Frances. They’re deaf. They’re creating a DSM that will make everything that’s wrong even worse. Loosening diagnostic criteria. Adding superfluous diagnoses. All but promoting over-diagnosing and over-medication instead of applying the brakes. Working behind closed doors when the dire consequences of good-old-boy networks and the epidemic stealth have created palpable negative consequences. Carrying the standard of biology and neuroscience at a time when they need to make room for some other flags. They’ve ignored the wind-sock that should’ve alerted them to the backlash they were walking into. And, if I may simply say the obvious, they’re not very good at design or administration. The DSM-5 process, besides being secretive, has been a mess – missed deadlines, indecision, empty jargon-ese responses to criticisms after their early nasty-gram with Dr. Schatzberg [Setting the Record Straight: A Response to Frances Commentary on DSM-V].

Here’s what I think happened: The process for the DSM-5 Revision started in 1999 [DSM-5 Overview: The Future Manual]. From 2000 through 2006 [when Drs. Kupfer and Regier were officially appointed] there were innumerable conferences about the coming Revision. It was the height of the Age of Psychopharmacology and Clinical Neuroscience [as psychiatry was renamed by NIMH Director Tom Insel in 2005]. They set a course in those salad days to sail into a brave new world. Then large cracks began to form in their planet’s crust – Senator Grassley’s investigations, chairmen "stepping down," large settlements against drug companies with the release of damning documents, an increasingly bad press [well deserved], failed recent advances in all directions, the disappointments in C.A.T.I.E. and STAR-D, and lots of pushback from blogs and books galore. The DSM-5 Task Force didn’t hear the music, failed to read the tea leaves, and kept running on the inertia from a previous heady time steered by the likes of APA President-elect then President Alan Schatzberg. After all, the future budget of the APA depended on the product. Like an aging starlet with too many facelifts, the dreams from the past couldn’t accept the mirror of the present – still can’t. So they didn’t adapt to the changing climate.

Now they’re trying to prepare us for disappointing Field Trials amid a consumer rebellion being lead by their predecessor, Dr. Allen Frances. The drug companies are turning tails looking for new prospects and leaving the KOLs to fend for themselves. Many Czars have been deposed – Nemeroff, Schatzberg, Keller. And the everyday psychiatrists are apathetic, surviving in their disenfranchised [but still lucrative] world of med-checks. What’s going to happen? Dr. Frances says:
What should be done now as DSM 5 enters its depressing endgame? There really is no rational choice except to drop the many unsupportable DSM 5 proposals and to dramatically improve the imprecise writing that plagues most of the DSM 5 criteria sets.
I agree with the "should," but I see no signs yet that it’s going to happen. My guess is that they’ll only partially follow his advice – maybe back off on some of the more egregious proposals like Attenuated Psychosis Syndrome – but continue on with others. Maybe they’ll have to good sense to remove biological from the definition of mental illness. The likelihood that they’ll take on the sins of the past like Major Depressive Disorder have dwindled to the point where the candle has gone out [or maybe never been lit]. I understand why Dr. Frances is now also writing in the Huffington Post. He’s hoping to finally get their attention. They still don’t get it that he’s trying to help them not rush headlong into a no exit minefield [or worse, the abyss of the irrelevant]…
  1.  
    Peggi
    January 10, 2012 | 8:38 AM
     

    Some terrific metaphors here, Mickey, but my favorite is the “aging starlet with too many facelifts”.

  2.  
    Tom
    January 11, 2012 | 7:56 AM
     

    In psychometric testing you would never use a ratio or variable that was coded if the kappa was below 0.80. To argue that kappa of .20 to .60 is acceptable is absurd.

  3.  
    January 11, 2012 | 3:44 PM
     

    I recently read Kirk & Kutchins’ book “The Selling of DSM”, which looked at how diagnostic reliability was used to create a false aura of scientificity around the DSM-III project. They argue, quite plausibly I think, that the political economic needs of an American psychiatry in crisis (esp. around insurer payments and drug research/approval) became tied up with this obsession with quantitative measures. In the absence of decent validity data then reliability was a useful step sideways to meet these needs.

    But of course back then K & K were able to excoriate the DSM field trial for accepting Kappas between 0.6 and 0.8, much lower than the above 0.8 originally hoped. How pathetic does the reliability obsession look now?

    Your view of why the DSM-5-ocracy have persisted with the naive hopes in neuroscience is compelling. I would add something else: Like the economists who usually serve Wall St, these people are tied to real material structures of profit-making and the complex research, industrial and financial complexes that make them up. Thus, the breakdown in hegemony (the passive or active consent of ordinary people in this process) can start to occur without directly affecting their certainty in the viability of the project *as experienced from their lofty positions*. In fact, the emergence of threats to their position will lead them to bunker down and seek to defend it, no matter how irrationally.

    Marx had a good name for economists: the hired prize-fighters of the bourgeoisie. Ideologists for the elites. Perhaps these research physicians, deeply tied to elite corporate and state power structures, are the hired prizefighters of the medical-industrial complex’s bourgeoisie?

  4.  
    January 11, 2012 | 4:19 PM
     

    Eloquently stated. Their responses seem more like reflexes than something informed by thoughtful contemplation – almost like Piaget’s “schema.” They’ve been talking to each other too long and forgotten that there’s. Another way to think.

Sorry, the comment form is closed at this time.