recalculating…

Posted on Wednesday 6 April 2011

The NIMH funded STAR*D trial [Sequenced Treatment Alternatives to Relieve Depression] was initiated in 2001. The theme of the American Psychiatric Association Meeting in New Orleans that year was Mind meets Brain [though the Brain clearly won the meet]. Charles Nemeroff had recently been nicknamed the "boss of bosses" and was the newly appointed Editor of Neuropsychopharmacology. Around that same time, he helped get David Healy fired from a new position in Toronto for reporting that Prozac could cause suicidal ideation [Let Them Eat Prozac was still three years in the future]. Alan Schatzberg, Chairman at Stanford, published his first trial of Mifepristone in Psychotic Depression with promising results. Zyprexa sales topped $2 B and Seroquel was coming on strong. It was the dawn of a new century, and a time when clinical neurochemistry was the paradigm du jour.

STAR*D was an ambitious trial, up to the challenges of the era – 4000 subjects, $35M, four Levels of treatment. It was designed to show how the aggressive treatment of Depression could achieve not just simple responses, but remissions that lasted through a year’s follow-up. The study was also practical. They would select their subjects not by advertising, but from patients who actually  sought treatment. They had some elaborate plans to keep patients engaged – brochures, patient education. And they would accept patients who were less profoundly depressed. It was set up to emulate the patients we see in our offices, a protocol easily followed by Primary Care Physicians [18 of the 41 sites were Primary Care facilities].

In 2006 when the results of the STAR*D trial were finally published, the climate was changing. Dr. Nemeroff had been censored by Emory for Conflicts of Interest and failing to report industry income in 2004. Then he stepped down as Editor of Neuropsychopharmacology after getting busted for failing to report his financial ties with products he endorsed in his Journal. Schlatzberg’s Mifepristone wasn’t fairing so well either. The first two Phase III Clinical Trials were reported in the Fall  of 2006 and failed to meet their goals for significance. And by this time, there was growing alarm about weight gain and the potential for Diabetes with the Atypical Antipsychotics with a class-wide FDA warning in every single bottle of pills.

So, back to the study, as a reminder, here’s the abstract from the STAR*D article in the American Journal of Psychiatry:
    OBJECTIVE: This report describes the participants and compares the acute and longer-term treatment outcomes associated with each of four successive steps in the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial.
    METHOD: A broadly representative adult outpatient sample with nonpsychotic major depressive disorder received one (N=3,671) to four (N=123) successive acute treatment steps. Those not achieving remission with or unable to tolerate a treatment step were encouraged to move to the next step. Those with an acceptable benefit, preferably symptom remission, from any particular step could enter a 12-month naturalistic follow-up phase. A score of <5 on the Quick Inventory of Depressive Symptomatology–Self-Report (QIDS-SR16) (equivalent to <7 on the 17-item Hamilton Rating Scale for Depression [HRSD17]) defined remission; a QIDS-SR16 total score of >11 (HRSD17 >14) defined relapse.
    RESULTS: The QIDS-SR16remission rates were 36.8%, 30.6%, 13.7%, and 13.0% for the first, second, third, and fourth acute treatment steps, respectively. The overall cumulative remission rate was 67%. Overall, those who required more treatment steps had higher relapse rates during the naturalistic follow-up phase. In addition, lower relapse rates were found among participants who were in remission at follow-up entry than for those who were not after the first three treatment steps.
    CONCLUSIONS: When more treatment steps are required, lower acute remission rates (especially in the third and fourth treatment steps) and higher relapse rates during the follow-up phase are to be expected. Studies to identify the best multistep treatment sequences for individual patients and the development of more broadly effective treatments are needed.
When STAR*D was reported, there was a companion Editorial, The STAR*D Study: A Four-Course Meal That Leaves Us Wanting More by J. Craig Nelson, M.D. It was generally complimentary, but added:
    The investigators are to be applauded for their emphasis on remission and their inclusion of the relapse data. Together these data start to inform us about sustained recovery in depression. In my opinion, the authors have cited the positive side of the coin here. They note that after four treatments the cumulative remission rate is 67%. But this does not account for relapse. If the goal of treatment is sustained recovery, relapse should be considered. I found a cumulative sustained recovery rate of 43% after four treatments, using a method similar to the authors but taking relapse rates into account…
If you look back at my first post on STAR*D a few days back, I was having the same kind of problem as Dr. Nelson [a thirty-five million dollar misunderstanding…]. There was something confusing about the numbers and I was driven to recalculate [like that computer lady in my truck’s GPS  says when I stop to eat, "recalculating"]. To be honest, I spent several days going over and over that article trying to decipher the tables, feeling dumb. Robert Whitaker apparently had the same difficulty [The STAR*D Scandal: A New Paper Sums It All Up]. This epidemic need to "recalculate" may, in fact, be the most important piece of data in the whole STAR*D report. What it tells us is that the authors of that article had something to hide, something they hoped we wouldn’t notice. Why would there be something to hide in an NIMH study with such a well thought out design? costing 35 million dollars? seven years in the making? enrolling 4000 subjects? in Psychiatry’s primo Journal? ultimately generating over seventy scholarly papers? Now there’s a research question worth sinking one’s teeth into!

Dr. Ed Pigott, a Maryland Psychologist, was up to the task. I ran across his work through Robert Whitaker’s site [Pigott and colleagues wrote the "New Paper" Whitaker referred to above]. In an email, Pigott writes, "For 5+ years now, I’ve been obsessed with deconstructing STAR*D by comparing its published methods and findings with STAR*D’s pre-specified research measures and analytic plan as described in primary source documents." And that’s apparently what it took to get STAR*D recalculated. I’m just going to hit the high points of what he found, but if you care at all about STAR*D or duplicity in Academic Psychiatry [or if you just enjoy some fine sleuthing], I’d recommend you read his most recent article in toto [STAR*D: A tale and trail of bias in Ethical Human Psychology and Psychiatry, 13(1), 6-28. 2011]. He’ll be glad to send it to you free [if you’ll return the favor by sending him your comments][go here, at the bottom of the page].

Here’s what he found the first time through:

    They included a large number of subjects who did not meet admission requirements for their study: STAR*D changed its eligibility for analysis criteria in the steps 2–4 and summary articles without making this change explicit to readers. This change resulted in 607 patients who were initially reported as excluded because their <14 score on the ROA-administered baseline HRSD signified at most only mild depressive symptoms when starting on citalopram (Celexa) in step 1 subsequently being included. Similarly, an additional 324 patients who were initially reported as excluded because they lacked a baseline ROA-administered HRSD in step 1 were subsequently included. Thus, 931 of STAR*D’s 4,041 patients (23% of all subjects) did not meet its step-1 eligibility for analysis criteria but were included in the steps 2–4 and summary articles’ analyses.
    STAR*D failed to disclose that all 4,041 patients were started on citalopram (Celexa) in their initial baseline visit and that they excluded from analysis the 370 patients who dropped out without any subsequent visits, although the step-1 article states, “our primary analyses classified patients with missing exit HRSD scores as nonremitters a priori”. These early dropout patients did not take the exit HRSD and therefore should have been counted as treatment failures as prespecified.
    STAR*D changed its outcome measures following data collection. As designed, STAR*D’s prespecified primary measure was the HRSD and the Inventory of Depressive Symptomatology—Clinician- Rated (IDS-C30), the secondary one for identifying “remitted” (i.e., those with a <7 HSRD score) and “responder” (i.e., those with a >50% reduction in depressive symptoms) patients. These measures were obtained in interviews by research outcome assessors (ROAs) blind to treatment assignment at entry into and exit from each trial and every 3 months during the 12 months of continuing care.
    STAR*D dropped the IDS-C30 and replaced it with the Quick Inventory of Depressive Symptomatology– Self-Report (QIDS-SR), a proprietary tool developed by STAR*D’s principal investigators. The QIDS-SR was not a prespecified research measure, but rather one of STAR*D’s “clinical management tools” that was used to guide care during every treatment visit. In the Pigott et al. paper’s peer-review process, a reviewer wrote that the National Institute of Mental Health’s (NIMH) Data Safety and Monitoring Board (DSMB) authorized the use of the QIDS-SR prior to “data lock and unblinding” because of STAR*D’s high study dropout rate, which frequently resulted in missing exit IDS-C30 and HRSD assessments. Pigott et al. made this change in the paper, even though it could not be documented in the published literature. Subsequently, this author learned that no such DSMB authorization occurred (see the succeeding discussion).
    STAR*D did not disclose how to interpret the quarter-by-quarter survival data for continuing care patients and thereby obscured from readers their startling finding that only 108 of the 1,518 remitted patients (7.1%) had not relapsed and/or dropped out by continuing-care’s 12th month (see Table 2).
In research, you declare what you’re going to do in advance. How you’re going to select your subjects. How you’re going to intervene. What you’re going to measure along the way. You specify how you’re going to insure that your ratings are "blind." The reasons are obvious. If you decide these things once you have the outcome or are on the way, you can jury-rig the it to go your way. This is just research 101. These investigators added in 900 subjects who didn’t meet their criteria for inclusion. They threw out 380 subjects who had been started on medication and enrolled in the study. They changed their primary outcome measure after the study was underway [that part is going to turn out to be even worse shortly]. And they hid the fact that there were actually very few subjects who stayed in the study that had the fabled remission without relapse in the year follow-up. These were major infractions of basic research protocol – unacceptable infractions. But that’s not the worst part of of it.

If you follow these things, you probably recognized the HRSD [Hamilton Rating Scale of Depression]. It’s the long-used scale to measure depressive symptoms. It’s administered by a clinician who rates 17 items by their severity. There was another, the IDC-C [ Inventory of Depressive Symptomatology – Clinician Rated] that is less well known [at least by me]. Then they mention the QIDS-SR [Quick Inventory of Depressive Symptomatology–Self-Report] and [in the paper] the QIDS-C [Quick Inventory of Depressive Symptomatology–Clinician Rated]. It’s a scale developed by the authors.

So what happened? The study protocol mentioned using the QIDS-SR only as a tool for the clinicians when they saw the subjects – adjusting meds etc. But when the study was published, the authors said:
    We used the Quick Inventory of Depressive Symptomatology–Self-Report (QIDS-SR) as the primary measure to define outcomes for acute and follow-up phases because:
      1. QIDS-SR ratings were available for all participants at each acute treatment clinic visit

      2. QIDS-SR and HRSD outcomes are highly related
      3. the QIDS-SR was not used to make treatment decisions, which minimizes the potential for clinician bias
      4. the QIDS-SR scores obtained from the interactive voice response system, the main follow-up outcome measure, and the paper-and-pencil QIDS-SR16 are virtually interchangeable, which allows us to use a similar metric to summarize the acute and follow-up phase results.
This is the kind of thing you skim right over when reading an article, if you even read that part at all. I actually noticed it was strange when I read through it, but I didn’t linger. Well, Dr. Pigott lingered. He even questioned the change when he submitted an early article, but a reviewer of his paper said:
    "… that the National Institute of Mental Health’s (NIMH) Data Safety and Monitoring Board (DSMB) authorized the use of the QIDS-SR prior to ‘data lock and unblinding’ because of STAR*D’s high study dropout rate, which frequently resulted in missing exit IDS-C30 and HRSD assessments."
So Pigott changed his paper. But then he filed a Freedom of Information Act request for the NIMH documents.
    The author received from NIMH the contract and research protocol, which included the analytic plan but not the DSMB meeting minutes or any quarterly or annual progress reports. The author was informed that these later documents could not be located and may have been destroyed. Despite not receiving all that was requested, the contract, research protocol, and analytic plan were very helpful in providing additional information in understanding STAR*D’s original purpose, measures, methods, and planned analyses.
What he found out was pretty significant. After a second FOIA request and a conference call, with two of the principles, he learned that the NIMH DSMB did not authorize this change in outcome measure.
    Both denied that STAR*D’s DSMB authorized using the QIDS-SR prior to “data lock and unblinding” as Pigott et al. had been informed by the reviewer. Instead, they stated that this decision occurred in the Communications Committee meetings for which notes were not taken as they were for the DSMB meetings.
So the QIDS-SR was available to the clinicians during the study – meaning it was not blind – meaning that the statement "the QIDS-SR was not used to make treatment decisions, which minimizes the potential for clinician bias" above was a lie.

I’m going to stop here for a breather. That’s enough for a single blog post [but by no means all]. Looking through all this psychopharmacology literature these last months, I’ve gotten used to inappropriate correction methods, jury-rigged statistical analyses, omitted information, overblown conclusions, rationalizations – all those things people do to obfuscate the things they don’t want you to see and magnify what they want the study to say. But If there’s outright lying, I haven’t detected it yet. And if it were going to happen, I’d expect to find it in some ghost-written pharmaceutical-funded clinical trial in a journal of the lesser gods, not a mega-study from the NIMH in the American Journal of Psychiatry.

I even feel a little guilty using that word ["lie"], but it is what it is. What does that mean about STAR*D? How does it fit in that little historical narrative I started with? That’s my real reason for the breather. I want to think about that for a bit…

Sorry, the comment form is closed at this time.