1 Boring Old Man » what’s the hurry?…

about
search

Posted on Thursday 13 March 2014

My first encounter with the work of Dr. Robert Gibbons, a statistician and the University of Chicago, was a couple of articles in which he challenged the Black Box warning on antidepressants for adolescents – declaring them both safe and effective. His analysis was based on a private, unavailable dataset. Besides the unavailability of the data to check his work, there was a well planned media roll-out with many interviews accompanying his papers. [an anatomy of a deceit 1…]. Later, I noted a similar low data, high visibility approach to the adverse effects of Chantix and Neurontin – again downplaying adversity and upgrading efficacy [very monotonous…]. And then recently, he published a flurry of articles on a new psychometric for screening for depression and anxiety – this time forming a company with Dr. David Kupfer to sell this test commercially. The glaring conflict of interest problems have been discussed ad nauseum here [see open letter to the APA…].

As before, there’s a lot of hype about these tests, but they’re hardly revolutionary. The only difference between the standard paper-and-pencil psychometrics and these adaptive tests is that they are likely quicker. Rather than answering a fixed set of questions, the computer picks the next question from a bank based on the last answer, and reaches a conclusion more quickly. How do they stack up? Dr. Bernard Carroll had criticized the article in a letter to the JAMA Psychiatry earlier:

Computerized Adaptive Test–Depression Inventory Not Ready for Prime Time

by Bernard Carroll

JAMA Psychiatry. 2013 70[7]:763

"The goal of commercial development seems premature; patients risk being “assayed” against a non–gold standard. Though CAT-DI may have been an interesting statistical challenge, it lacks a solid clinimetric grounding. It is not ready for clinical use…"

Now Neuroskeptic also takes a look at the reported outcomes for the CAT-DI Adaptive Test:

Can A Computer Measure Your Mood? [CAT Part 3]

Discover

by Neuroskeptic

March 12, 2014

[full text on-line]

Now, I’m finally going to delve into the statistics to find out: does it really work? The CAT-DI was revealed in a 2012 paper by Robert Gibbons and colleagues in the prestigious Archives of General Psychiatry. In this article (which has been previously criticized), the authors, after introducing the theoretical background of the method, and describing its development, compared the CAT-DI against three widely used, old-fashioned pen-and-paper depression questionnaires, the HAMD, the PHQ9, and CES-D.

Gibbons et al examined the ability of each of these four measures to distinguish between three groups of people: those diagnosed with no depression, with minor depression, or with major depression. An ideal depression scale ought to give, respectively, low, medium and high scores for these three different groups. The importance of this comparison can hardly be overstated. It asks the question: is the CAT-DI any better than what we already have? What, if anything, does the new kid bring to the party? And this is the only head-to-head comparison of the CAT-DI’s performance in the paper…

When Neuroskeptic tried to vet the study, he ran into the same problem I’ve had with each of Dr. Gibbons outings:

… However, remarkably, Gibbons et al give almost no details about the results. This is all they say about it in the Results section:

In general, the distribution of scores [on the traditional questionnaires] among the diagnostic categories [no depression, minor, major] showed greater overlap (ie, less diagnostic specificity particularly for no depression vs minor depression), greater variability, and greater skewness, for these other scales relative to the CAT-D

I did a double-take when I realized that this was all we’re given. ‘In general’? No p-values? No confidence intervals? No numbers of any kind (except for some descriptive stats for the CAT-DI group only)? ‘In general’, one would expect those things in a scientific paper.

That’s been my problem every time I try to look over Gibbons’ articles, there’s not enough information to get anywhere. Neuroskeptic looked at the graphs comparing the different tests by actually counting pixels [which is pretty ingenious].

The take-home conclusion:

Overall proportional overlap – which I defined as the total of the two overlaps between the adjacent bars, divided by the total length of the three bars – was identical to within the margin of error [i.e. 1 pixel] but for what it’s worth, the CES-D was marginally better [with 0.397 ratio vs 0.399]. This is an… unorthodox approach to psychometrics I’ll be the first to admit, but it’s the best that I could do given the [lack of] information provided in the paper, and I feel that it’s more rigorous than just saying ‘in general’.

[I told you it was ingenious!]. He goes on to report on a 2010 article by one of Dr. Gibbons’ fellow authors [Dr. Pilkonis] that had pointed to the limitations of the CAT tests.

I’ll have to admit that Dr. Gibbons’ papers are hard for me to evaluate – scientifically and emotionally. I don’t agree with his conclusions before I read his arguments. I don’t think antidepressants are either effective or benign in kids. I’m suspicious that a quickie test like this will end up in the waiting rooms of physicians and result in even more over-medication. I’m put off by the hype, the invariable commercial overtones, and the absence of useful, checkable data in his papers. And, by nature, I prefer gathering subjective data subjectively – by an unhurried interview. I don’t see these things as conflicts of interest, but rather differences of values.

The most a Computerized Adaptive Test can offer over a conventional psychometric is speed, measured in minutes [and what’s the hurry?]. Intuitively, I can’t see them as anything but less precise longitudinally – as each iteration is a different test. So I don’t get the need, and I sure don’t get all the hype. I would personally like to see doctors spending their waiting room budgets on more comfortable chairs, a wider variety of magazines [current issues], and banning news channels and industry videos on their television sets. If you want to know if I’m depressed or anxious, ask me…

Arby (Not a Doctor)

March 13, 2014 | 11:44 AM

Great article. I disdain the mood assessment checklists and made the point about their validity and misuse in a comment on Psychiatric Times last summer. However, the perception that depression is under diagnosed and needs to be uncovered (whether correct or not), but without the will (or money) to spend the requisite time with patients, encroaches on the visit with or without the use of adaptative checklists.

I went for a screening colonoscopy last fall and it didn’t phase me in the prep room prior that I was asked if I had depression. I am used to that question now, because I have been asked this on all types of visits including one to an allergist. What I wasn’t prepared for was the next question the nurse asked and that was if I was suicidal. I almost answered “yes” just to see what they would do for someone having a colonscopy in 20 minutes who was suicidal, but I didn’t because a) it would be very wrong b) I do not need that in my chart for obvious reasons.

I am not making light of suicidal ideations. My point is that is the more farcical the depression screening gets (without or without forms) the less the public will see it as legitimate or helpful and those that are truly suicidal will still be lost in the weeds.

And, yes, brilliant idea on the pixel count. When photo editing (or when trying to see if something has been edited) there is often no other way but to go down to the pixel level.
Altostrata

March 13, 2014 | 2:50 PM

How did you, a non-doctor, manage to comment on Psychiatric Times?
Arby (Not a Doctor)

March 13, 2014 | 4:17 PM

How did you, a non-doctor, manage to comment on Psychiatric Times?

I know the secret handshake.

Actually, the answer is in my comment above. It was last summer. Then anyone registered could comment. I know they were still letting the public comment as late as Aug 7, 2013 because that is when I wrote one on an article by Gerard Sanacora, MD, PhD. It wasn’t disrespectful, but it basically called him out on being disingenuous in one of his statements in that article. It displayed for all of 24 hours before they deleted it. Looks like they have a habit of removing things they would rather not deal with. I honestly don’t know what date they restricted the rights to comments after this.
Tom

March 13, 2014 | 8:03 PM

Neuroskeptic is a genius and a seeker of the truth. Who the hell vetted this paper for the Archives? What Neuroskeptic pointed out regarding the utter lack of data to support conclusions is mind blowing. How does this happen? Isn’t the Archives a flagship journal in Psychiatry?
Joseph Arpaia

March 14, 2014 | 12:07 AM

The checklists and diagnoses that come from them describe a patient’s mental state analagously to using 8 colors to paint a landscape. There is no nuance, no expression of feeling, no exploration of the interplay among values, challenges, betrayals, hopes, … the real stuff of life. I find the most useful rating scale with my patients to be “So, what has been happening with you since we last met?”
Neuroskeptic

March 14, 2014 | 9:56 AM

Thanks for the kind words, everyone.

When I was first reading Gibbons et al I was struck by the lack of detail in the main paper, but I wasn’t too concerned because I expected to find it in the Supplementary Material (SM)

However the SM, while extensive, doesn’t contain any extra detail on the performance of the CAT. It consists of several pages of technical detail on the mathematics, and tables listing the many questionnaires that were used to provide items for the CAT, but nothing on how well it works.

And also, although there is an “impressively” large amount of information in the SM, anyone who wants to replicate the authors’ analyses will go away empty handed as the relevant details (e.g. the raw data from the calibration study, amongst other things) are not included.
AA

March 16, 2014 | 5:24 AM

“”I am not making light of suicidal ideations. My point is that is the more farcical the depression screening gets (without or without forms) the less the public will see it as legitimate or helpful and those that are truly suicidal will still be lost in the weeds.””

Exactly Arby. By the way, when the issue of toughening mental health commitment laws came up in my state, I will never forget this one person who was suicidal every day but refused to get help out of fear of being committed against her will.

So what if you had said you were suicidal, would you have gotten an SSRI as part of your sedation? Anyway, since I am having surgery next month, I guess I had better be prepared for this extreme absurdity.

Sorry, the comment form is closed at this time.