Recall that reliability has been measured as inter-rater reliability using kappa as the operative index by this formula:
As we approach time for the DSM-5 to be ratified by the Trustees of the American Psychiatric Association, here are the results of the DSM-5 Field Trials as I’ve been able to piece them together for review:
Disorder | DSM-5 (95% CI) | DSM-IV | ICD-10 | DSM-III |
|
||||
Major neurocognitive disorder | .78 (.68 – .87) | — | .66 | .91 |
Autism spectrum disorder | .69 (.58 – .80) | .85 | .77 | -.01 |
Post traumatic stress disorder | .67 (.59 – .74) | .59 | .76 | .55 |
Child attention deficit disorder | .61 (.51 – .72) | .59 | .85 | .50 |
Complex somatic disorder | .60 (.41 – .78) | — | .45* | .42* |
Bipolar disorder | .54 (.43 – .65) | — | .69 | — |
Oppositional defiant disorder | .41 (.21 – .61) | .55 | — | .66 |
Major Depressive Disorder (in adults) | .32 (.24 – .40) | .59 | .53 | .80 |
Generalized anxiety disorder | .20 (.02 – .36) | .65 | .30 | .72 |
Disruptive mood dysregulation disorder | .50 (.32 – .66) | — | — | — |
Schizophrenia | .46 | .76 | .79 | .81 |
Mild neurocognitive disorder | .50 ( .40 – .60) | — | — | — |
Schizoaffective Disorder | .50 | .54 | .51 | .54 |
Mild traumatic brain injury | .46 (.28 – .63) | — | — | — |
Alcohol use disorder | .40 (.27 – .54) | — | .71 | .80 |
Hoarding | .59 (.17 – .83) | — | — | — |
Binge Eating | .56 (.32 – .78) | — | — | — |
Major Depressive Disorder (in kids) | .29 (.15 – .42) | — | — | — |
Borderline personality disorder | .58 (.46 – .71) | — | — | — |
Mixed anxiety/depressive disorder | .06 | — | — | — |
Conduct Disorder | .48 | .57 | .78 | .61 |
Antisocial Personality Disorder | .22 | — | — | — |
Obsessive Compulsive Disorder | .31 | — | — | — |
Attenuated Psychosis Syndrome | .46 (0-?) | — | — | — |
And below is a simple set of distribution graphs of kappa for this and previous revisions. On the left is a distribution of the Kappa values from our old system [DSM-II] using Dr. Spitzer’s 1974 meta-analysis of five reliability studies – values considered at the time unacceptable and a reason to radically revise the whole diagnostic system. We set our own standard back in those days, our own benchmark. Dr. Spitzer and his statistician colleagues essentially created and tested Kappa to be the reliability coefficient for psychiatric diagnosis. The other three graphs are from the actual Field Trials:
Before the Field Trials were reported in May, we were warned to lower our expectations:
DSM-5: How Reliable Is Reliable Enough?
by Helena Chmura Kraemer, David J. Kupfer, Diana E. Clarke, William E. Narrow, and Darrel A. Regier
American Journal of Psychiatry 2012 169:13-15.
… We previously commented in these pages on the need for field trials. Our purpose here is to set out realistic expectations concerning that assessment. In setting those expectations, one contentious issue is whether it is important that the prevalence for diagnoses based on proposed criteria for DSM-5 match the prevalence for the corresponding DSM-IV diagnoses. However, to require that the prevalence remain unchanged is to require that any existing difference between true and DSM-IV prevalence be reproduced in DSM-5. Any effort to improve the sensitivity of DSM-IV criteria will result in higher prevalence rates, and any effort to improve the specificity of DSM-IV criteria will result in lower prevalence rates. Thus, there are no specific expectations about the prevalence of disorders in DSM-5. The evaluations primarily address reliability.… Reliability will be assessed using the intraclass kappa coefficient κI. For a categorical diagnosis with prevalence P, among subjects with an initial positive diagnosis, the probability of a second positive diagnosis is κI+P[1–κI], and among the remaining, it is P[1–κI]. The difference between these probabilities is κI. Thus κI=0 means that the first diagnosis has no predictive value for a second diagnosis, and κI=1 means that the first diagnosis is perfectly predictive of a second diagnosis
Reliability is essentially a signal-to-noise ratio indicator. In diagnosis, there are two major sources of “noise”: the inconsistency of expression of the diagnostic criteria by patients and the application of those criteria by the clinicians. It is all too easy to exaggerate reliability by removing some of that noise by design. Instead of a representative sample, as in DSM-5 field trials, one might select “case subjects” who are unequivocally symptomatic and “control subjects” who are unequivocally asymptomatic, omitting the ambiguous middle of the population for whom diagnostic errors are the most common and most costly. That approach would hide much of the patient-generated noise
… It is unrealistic to expect that the quality of psychiatric diagnoses can be much greater than that of diagnoses in other areas of medicine, where diagnoses are largely based on evidence that can be directly observed. Psychiatric diagnoses continue to be based on inferences derived from patient self-reports or observations of patient behavior. Nevertheless, we propose that the standard of evaluation of the test-retest reliability of DSM-5 be consistent with what is known about the reliability of diagnoses in other areas of medicine.
… From these results, to see a κIfor a DSM-5 diagnosis above 0.8 would be almost miraculous; to see κIbetween 0.6 and 0.8 would be cause for celebration. A realistic goal is κIbetween 0.4 and 0.6, while κIbetween 0.2 and 0.4 would be acceptable. We expect that the reliability [intraclass correlation coefficient] of DSM-5 dimensional measures will be larger, and we will aim for between 0.6 and 0.8 and accept between 0.4 and 0.6. The validity criteria in each case mirror those for reliability…
And here are the two versions of the interpretation of Kappa from Dr. Kraemer’s and Dr. Frances’ articles in January of this year…
Kappa | ||||||||||
<0.20 | >0.20 & <0.40 | >0.40 & <0.60 | >0.60 & <0.80 | >0.80 | ||||||
|
||||||||||
Kraemer et al | negative | acceptable | realistic | celebration | miraculous | |||||
Allen Frances | negative | ~ no agreement | poor | fair | good |
Redefining the standards after the results are in just isn’t Kosher [see Hypothesizing After Results Known = HARK!]. Just because it’s done in a lot of the Clinical Trial write-ups doesn’t mean it’s valid. Had Dr. Spitzer used Dr. Kraemer’s criteria in 1974, he wouldn’t have needed to revise much of anything in the DSM-III. And while all of the results of the Field Trials were disappointing, several values were particularly troubling. The representatives of the Task Force said of these very low results:
Disorders that did not do well include the following:
Kappa Disorder DSM-5 DSM-IV ICD-10 DSM-III MDD (in adults) .32 (.24 – .40) .59 .53 .80 GAD .20 (.02 – .36) .65 .30 .72 In addition, MDD in children had a Kappa score of just .29 (.15 – .42). Dr. Regier reported that at the Dallas Veterans Affairs Medical Center in Texas, patients often had major depression along with PTSD, antisocial personality disorder, and mild TBI. "When other diagnoses were present, there tended to be a downplay of the reporting of depression in favor of the disorder considered more serious," he said. Jan Fawcett, MD, chair of the Mood Disorders Work Group, noted in a separate session that depression comorbidities were not allowed in the DSM-IV field trials. "It might be that that is the liability of that diagnosis in the real world," he said.
I’ve expressed my personal complaint about the DSM-5 enough times so I won’t go into it here except to reiterate that I think this Task Force was primarily focused on escaping the atheoretical restraints of previous versions to finally achieve a neoKraepelinian dream of a biological basis for mental illness – and they failed. But that’s not my point here. For the moment, we have to respect what we don’t know, where we don’t agree, and find our way forward until the day when all is revealed. The DSM-III and subsequent revisions were an attempt to do just that, to keep an open mind until things are clear – to stick to what we do know. And with notable exceptions, the DSMs have been accepted as a fair shot at achieving some kind of objectivity in the squishy world we live and work in. Dr. Spitzer came up with atheoretical, descriptive, and reliable to replace ideology and conjecture to organize our thinking.
fyi
Leading 21st-Century Department Requires Creativity, Patience
“Psychiatry must redefine itself,” said T. Byram Karasu, M.D., the Silverman Professor and chair of the Department of Psychiatry and Behavioral Sciences at Albert Einstein College of Medicine and psychiatrist in chief at Montefiore Medical Center. Karasu recalled a time during his residency training when many psychiatrists exclusively practiced psychotherapy, a role since assumed largely by social workers and psychologists.
“Even the prescription of psychotropic drugs has been assumed by family physicians and other medical specialties,” he added. “We need to convince government officials and insurance companies that we can fulfill a role that is essential and unique to us.”
http://psychnews.psychiatryonline.org/newsArticle.aspx?articleid=1361752
fyi
Psychiatric group: Parental alienation no disorder
By DAVID CRARY AP National Writer / September 21, 2012
http://www.boston.com/lifestyle/health/2012/09/21/psychiatric-group-parental-alienation-disorder/T4LXRkseoyCRDfuJUuffeN/story.html
“NEW YORK (AP) — Rebuffing an intensive lobbying campaign, a task force of the American Psychiatric Association has decided not to list the disputed concept of parental alienation in the updated edition of its catalog of mental disorders.”