take notice…

Posted on Monday 24 September 2012

Recall that reliability has been measured as inter-rater reliability using kappa as the operative index by this formula:

As we approach time for the DSM-5 to be ratified by the Trustees of the American Psychiatric Association, here are the results of the DSM-5 Field Trials as I’ve been able to piece them together for review:

Disorder   DSM-5 (95% CI)     DSM-IV     ICD-10     DSM-III  

Major neurocognitive disorder .78 (.68 – .87) .66 .91
Autism spectrum disorder .69 (.58 – .80) .85 .77 -.01
Post traumatic stress disorder .67 (.59 – .74) .59 .76 .55
Child attention deficit disorder .61 (.51 – .72) .59 .85 .50
Complex somatic disorder .60 (.41 – .78) .45* .42*
Bipolar disorder .54 (.43 – .65) .69
Oppositional defiant disorder .41 (.21 – .61) .55 .66
Major Depressive Disorder (in adults) .32 (.24 – .40) .59 .53 .80
Generalized anxiety disorder .20 (.02 – .36) .65 .30 .72
Disruptive mood dysregulation disorder .50 (.32 – .66)
Schizophrenia .46 .76 .79 .81
Mild neurocognitive disorder .50 ( .40 – .60)
Schizoaffective Disorder .50 .54 .51 .54
Mild traumatic brain injury .46 (.28 – .63)
Alcohol use disorder .40 (.27 – .54) .71 .80
Hoarding .59 (.17 – .83)
Binge Eating .56 (.32 – .78)
Major Depressive Disorder (in kids) .29 (.15 – .42)
Borderline personality disorder .58 (.46 – .71)
Mixed anxiety/depressive disorder .06
Conduct Disorder .48 .57 .78 .61
Antisocial Personality Disorder .22
Obsessive Compulsive Disorder .31
Attenuated Psychosis Syndrome .46 (0-?)

And below is a simple set of distribution graphs of kappa for this and previous revisions. On the left is a distribution of the Kappa values from our old system [DSM-II] using Dr. Spitzer’s 1974 meta-analysis of five reliability studies – values considered at the time unacceptable and a reason to radically revise the whole diagnostic system. We set our own standard back in those days, our own benchmark. Dr. Spitzer and his statistician colleagues essentially created and tested Kappa to be the reliability coefficient for psychiatric diagnosis. The other three graphs are from the actual Field Trials:

Before the Field Trials were reported in May, we were warned to lower our expectations:

DSM-5: How Reliable Is Reliable Enough?
by Helena Chmura Kraemer, David J. Kupfer, Diana E. Clarke, William E. Narrow, and Darrel A. Regier
American Journal of Psychiatry 2012 169:13-15.

… We previously commented in these pages on the need for field trials. Our purpose here is to set out realistic expectations concerning that assessment. In setting those expectations, one contentious issue is whether it is important that the prevalence for diagnoses based on proposed criteria for DSM-5 match the prevalence for the corresponding DSM-IV diagnoses. However, to require that the prevalence remain unchanged is to require that any existing difference between true and DSM-IV prevalence be reproduced in DSM-5. Any effort to improve the sensitivity of DSM-IV criteria will result in higher prevalence rates, and any effort to improve the specificity of DSM-IV criteria will result in lower prevalence rates. Thus, there are no specific expectations about the prevalence of disorders in DSM-5. The evaluations primarily address reliability.

… Reliability will be assessed using the intraclass kappa coefficient κI. For a categorical diagnosis with prevalence P, among subjects with an initial positive diagnosis, the probability of a second positive diagnosis is κI+P[1–κI], and among the remaining, it is P[1–κI]. The difference between these probabilities is κI. Thus κI=0 means that the first diagnosis has no predictive value for a second diagnosis, and κI=1 means that the first diagnosis is perfectly predictive of a second diagnosis

Reliability is essentially a signal-to-noise ratio indicator. In diagnosis, there are two major sources of “noise”: the inconsistency of expression of the diagnostic criteria by patients and the application of those criteria by the clinicians. It is all too easy to exaggerate reliability by removing some of that noise by design. Instead of a representative sample, as in DSM-5 field trials, one might select “case subjects” who are unequivocally symptomatic and “control subjects” who are unequivocally asymptomatic, omitting the ambiguous middle of the population for whom diagnostic errors are the most common and most costly. That approach would hide much of the patient-generated noise

… It is unrealistic to expect that the quality of psychiatric diagnoses can be much greater than that of diagnoses in other areas of medicine, where diagnoses are largely based on evidence that can be directly observed. Psychiatric diagnoses continue to be based on inferences derived from patient self-reports or observations of patient behavior. Nevertheless, we propose that the standard of evaluation of the test-retest reliability of DSM-5 be consistent with what is known about the reliability of diagnoses in other areas of medicine.

… From these results, to see a κIfor a DSM-5 diagnosis above 0.8 would be almost miraculous; to see κIbetween 0.6 and 0.8 would be cause for celebration. A realistic goal is κIbetween 0.4 and 0.6, while κIbetween 0.2 and 0.4 would be acceptable. We expect that the reliability [intraclass correlation coefficient] of DSM-5 dimensional measures will be larger, and we will aim for between 0.6 and 0.8 and accept between 0.4 and 0.6. The validity criteria in each case mirror those for reliability…

And here are the two versions of the interpretation of Kappa from Dr. Kraemer’s and Dr. Frances’ articles in January of this year…

    <0.20   >0.20 & <0.40   >0.40 & <0.60   >0.60 & <0.80   >0.80

Kraemer et al   negative   acceptable   realistic   celebration   miraculous
Allen Frances   negative   ~ no agreement   poor   fair   good

Redefining the standards after the results are in just isn’t Kosher [see Hypothesizing After Results Known = HARK!]. Just because it’s done in a lot of the Clinical Trial write-ups doesn’t mean it’s valid. Had Dr. Spitzer used Dr. Kraemer’s criteria in 1974, he wouldn’t have needed to revise much of anything in the DSM-III. And while all of the results of the Field Trials were disappointing, several values were particularly troubling. The representatives of the Task Force said of these very low results:

Disorders that did not do well include the following:

Disorder DSM-5 DSM-IV ICD-10 DSM-III

MDD (in adults)    .32 (.24 – .40) .59 .53 .80
GAD    .20 (.02 – .36) .65 .30 .72

In addition, MDD in children had a Kappa score of just .29 (.15 – .42). Dr. Regier reported that at the Dallas Veterans Affairs Medical Center in Texas, patients often had major depression along with PTSD, antisocial personality disorder, and mild TBI. "When other diagnoses were present, there tended to be a downplay of the reporting of depression in favor of the disorder considered more serious," he said. Jan Fawcett, MD, chair of the Mood Disorders Work Group, noted in a separate session that depression comorbidities were not allowed in the DSM-IV field trials. "It might be that that is the liability of that diagnosis in the real world," he said.

There has been a thread in the comments recently about how such things might be evaluated based on the bias of the observer. A DSM Skeptic might look at those numbers as confirming his/her skepticism. A DSM enthusiast might explain them away as Dr. Regier does, based on some mitigating factor. And certainly, the personal subjectivity, discipline [eg analyst vs neuroscientist], or other biases of the observer would be something to think about in this case. But whatever glasses you happen to wear, those numbers up there are a surprise – at least they were a big surprise to me. I am a longstanding critic of the Major Depressive Disorder category, feeling that it was much too broad and interfered with parsing out discrete depressions for study or treatment. But my objections wouldn’t explain such a low kappa value [if anything, they might suggest the opposite]. And it’s hard to rationalize the fact that the overall distribution of kappa values in the DSM-5 Field Trials is like that found by Dr. Spitzer in 1974 before his DSM-III revision ever happened – a step backwards.

I’ve expressed my personal complaint about the DSM-5 enough times so I won’t go into it here except to reiterate that I think this Task Force was primarily focused on escaping the atheoretical restraints of previous versions to finally achieve a neoKraepelinian dream of a biological basis for mental illness – and they failed. But that’s not my point here. For the moment, we have to respect what we don’t know, where we don’t agree, and find our way forward until the day when all is revealed. The DSM-III and subsequent revisions were an attempt to do just that, to keep an open mind until things are clear – to stick to what we do know. And with notable exceptions, the DSMs have been accepted as a fair shot at achieving some kind of objectivity in the squishy world we live and work in. Dr. Spitzer came up with atheoreticaldescriptive, and reliable to replace ideology and conjecture to organize our thinking.

There are now >3000 articles on Major Depressive Disorder on pubmed – another ~1000 on Generalized Anxiety Disorder. There are ~1300 Clinical Trials on Major Depressive Disorder alone on clinicaltrials.gov. All other considerations aside, these Field Trial results can’t just be ignored. If we are to continue to use diagnostic categories as a basis for clinical trials, in treatment recommendations, for funding of medical care, we need to do better than this. By our own design, reliability trumps anything else and transcends all of our other differences. If those numbers are spurious and reflect something idiosyncratic about the way the DSM-5 Field Trials were conducted, we need more than anecdotal guesses about what went wrong. If they reflect the true reliability [unreliability] of our diagnostic system in practice, we need to take notice and head back to the drawing board…
    September 24, 2012 | 8:23 AM


    Leading 21st-Century Department Requires Creativity, Patience

    “Psychiatry must redefine itself,” said T. Byram Karasu, M.D., the Silverman Professor and chair of the Department of Psychiatry and Behavioral Sciences at Albert Einstein College of Medicine and psychiatrist in chief at Montefiore Medical Center. Karasu recalled a time during his residency training when many psychiatrists exclusively practiced psychotherapy, a role since assumed largely by social workers and psychologists.

    “Even the prescription of psychotropic drugs has been assumed by family physicians and other medical specialties,” he added. “We need to convince government officials and insurance companies that we can fulfill a role that is essential and unique to us.”


    September 24, 2012 | 9:11 AM


    Psychiatric group: Parental alienation no disorder
    By DAVID CRARY AP National Writer / September 21, 2012


    “NEW YORK (AP) — Rebuffing an intensive lobbying campaign, a task force of the American Psychiatric Association has decided not to list the disputed concept of parental alienation in the updated edition of its catalog of mental disorders.”

Sorry, the comment form is closed at this time.