study 329 vi – revisited…

Posted on Wednesday 16 September 2015

Well, our RIAT article, Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence, is finally published online at the British Medical Journal. It’s fairly straightforward. The emphasis is on the harms analysis for obvious reasons – an accurate representation of a drug’s safety is always the first order of business. Here’s the abstract and links for the 2001 original publication:
by MARTIN B. KELLER, M.D., NEAL D. RYAN, M.D., MICHAEL STROBER, PH.D., RACHEL G. KLEIN, PH.D., STAN P. KUTCHER, M.D., BORIS BIRMAHER, M.D., OWEN R. HAGINO, M.D., HAROLD KOPLEWICZ, M.D., GABRIELLE A. CARLSON, M.D., GREGORY N. CLARKE, PH.D., GRAHAM J. EMSLIE, M.D., DAVID FEINBERG, M.D., BARBARA GELLER, M.D., VIVEK KUSUMAKAR, M.D., GEORGE PAPATHEODOROU, M.D., WILLIAM H. SACK, M.D., MICHAEL SWEENEY, PH.D., KAREN DINEEN WAGNER, M.D., PH.D., ELIZABETH B. WELLER, M.D., NANCY C. WINTERS, M.D., ROSEMARY OAKES, M.S., AND JAMES P. MCCAFFERTY, B.S.
Journal of the American Academy of Child and Adolescent Psychiatry, 2001, 40[7]:762–772.

Objective: To compare paroxetine with placebo and imipramine with placebo for the treatment of adolescent depression.
Method: After a 7 to 14-day screening period, 275 adolescents with major depression began 8 weeks of double-blind paroxetine [20–40 mg], imipramine [gradual upward titration to 200–300 mg], or placebo. The two primary outcome measures were endpoint response [Hamilton Rating Scale for Depression [HAM-D] score <8 or >50% reduction in baseline HAM-D] and change from baseline HAM-D score. Other depression-related variables were [1] HAM-D depressed mood item; [2] depression item of the Schedule for Affective Disorders and Schizophrenia for Adolescents-Lifetime version [K-SADS-L]; [3] Clinical Global Impression [CGI] improvement scores of 1 or 2; [4] nine-item depression subscale of K-SADS-L; and [5] mean CGI improvement scores.
Results: Paroxetine demonstrated significantly greater improvement compared with placebo in HAM-D total score <8, HAM-D depressed mood item, K-SADS-L depressed mood item, and CGI score of 1 or 2. The response to imipramine was not significantly different from placebo for any measure. Neither paroxetine nor imipramine differed significantly from placebo on parent- or self-rating measures. Withdrawal rates for adverse effects were 9.7% and 6.9% for paroxetine and placebo, respectively. Of 31.5% of subjects stopping imipramine therapy because of adverse effects, nearly one third did so because of adverse cardiovascular effects.
Conclusions: Paroxetine is generally well tolerated and effective for major depression in adolescents.


[upper figure from original] [lower figure from the next paper]

And here’s our RIAT republication:
by Le Noury J, Nardo J, Healy D, Jureidini J, Raven M, Tufanaru C, and Abi-Jaoude E.
British Medical Journal. 2015 …

Objectives: This is a reanalysis of SmithKline Beecham’s Study 329 [published by Keller et al. in 2001], the primary objective of which was to compare the efficacy and safety of paroxetine and imipramine to placebo in the treatment of adolescents with unipolar major depression. The objective of this restoration under the Restoring Invisible and Abandoned Trials [RIAT] initiative was to see whether access to and reanalysis of a full dataset from a randomised controlled trial would have clinically relevant implications for evidence based medicine.
Design: Double-blind randomised placebo-controlled trial. Setting: 12 North American academic psychiatry centres, from 20 April 1994 to 15 February 1998.
Participants: 275 adolescents with major depression of at least 8 weeks in duration. Exclusion criteria included a range of comorbid psychiatric and medical disorders and suicidality. Interventions: Participants were randomised to 8 weeks double-blind treatment with paroxetine [20-40 mg], imipramine [200-300 mg], or placebo.
Main outcome measures: The pre-specified primary efficacy variables were: change from baseline to the end of the 8-week acute treatment phase in total Hamilton Depression Scale [HAM-D] score; and the proportion of responders [HAM-D score <8 or >50% reduction in baseline HAM-D] at acute endpoint. Pre-specified secondary outcomes were [1] changes from baseline to endpoint in the following parameters: depression items in K-SADS-L; Clinical Global Impression; Autonomous Functioning Checklist; Self-Perception Profile; Sickness Impact Scale, [2] predictors of response, [3] number of patients who relapse during the maintenance phase. Adverse experiences were to be compared primarily by using descriptive statistics. No coding dictionary was pre-specified.
Results: The efficacy of paroxetine and imipramine was not statistically or clinically significantly different from placebo for any pre-specified primary or secondary efficacy outcome. HAM-D scores decreased by 10.73 [9.134 to 12.328], 8.95 [7.356, to 10.541] and 9.08 [7.450 to 10.708] points, least-squares mean [95%Confidence Interval], respectively, for the paroxetine, imipramine and placebo groups [p = 0.204]. Clinically significant increases in harms were observed, including suicidal ideation and behaviour and other serious adverse events in the paroxetine group and cardiovascular problems in the imipramine group.
Conclusions: Neither paroxetine nor high-dose imipramine demonstrated efficacy for major depression in adolescents, and there was an increase in harms with both drugs. Access to primary data from trials has important implications for both clinical practice and research, including that published conclusions about efficacy and safety should not be read as authoritative. The reanalysis of Study 329 illustrates the necessity of making primary trial data available to increase the rigour of the evidence base.

Keller et al concluded that Paroxetine was effective in adolescent depression in this study, and we found that it wasn’t. The difference was not in the data. The data was the same. And while there were some differences in the analytic approach, that doesn’t explain the different conclusion either. The difference was in what the two groups considered to be the outcome measures. We followed SKB’s a priori protocol definition of primary and secondary variables, whereas they added four new outcomes. Without those add-on outcomes, there were no significant findings. From our paper [page 4]:
Both before and after breaking the blind, however, the sponsors made changes to the secondary outcomes… We could not find any document that provided any scientific rationale for these post hoc changes, and the outcomes are therefore not reported in this paper…

Outcome variables not specified in protocol:
There were four outcome variables in the CSR and in the published paper that were not specified in the protocol. These were the only outcome measures reported as significant. They were not included in any version of the protocol as amendments (despite other amendments), nor were they submitted to the institutional review board. The CSR (section 3.9.1) states they were part of an “analysis plan” developed some two months before the blinding was broken. No such plan appears in the CSR, and we have no contemporaneous documentation of that claim, despite having repeatedly requested it from GSK.
In the paper, they didn’t explain adding these new outcomes. It just happened. In the CSR [Clinical Study Report] made public in 2004, they described the process of this change ["analysis plan"], but the "why" didn’t come until several years later in a response to a challenge. In the course of our re-analysis, I think we ended up with a fairly clear picture of that story. There were several things along the path that were involved. The simplest was an omission in the original a priori protocol. They neglected any discussion of correcting for multiple variables. With as many variables as were accessed in this study, that was a very significant omission. Had they applied absolutely any of the available correction schemes needed as a check on false positives, even with the other changes, they could’ve never claimed a positive efficacy outcome.

There were several other subtle maneuvers that shaped their outcome claims. I’ll leave them for later. Right now, it’s enough to ask you to look over our paper itself. But I will say that the erroneous conclusion in Keller et al can hardly be chalked up to a mistake. It shows too many tell-tale signs of intention. But in spite of all the documents amassed about this study, there’s no way for anyone to really know all the forces at work fourteen years ago when this article was written, or what the process among the various players and authors actually was. We do know that the CSR [Clinical Study Report] used by ghost-writer Sally Laden already contained these subtleties, so I presume that they happened before the summary ever got to her. That’s about as close as I can get. The next several posts is what I can piece together about this story…

  1.  
    Tom
    September 16, 2015 | 10:30 PM
     

    Nice job. And you made the NY Times!

  2.  
    Johanna
    September 16, 2015 | 10:35 PM
     

    CONGRATULATIONS! So far y’all are in the New York Times, the Guardian, the Sydney Morning Herald, Time Magazine, MedPage Today and Canadian Broadcasting Corp … THANKS for years of hard work.

    I did notice that the Sydney papers have interviewed Jureidini, and the CBC has interviewed Elia Abi-Jaoud. So when is the Journal-Constitution gonna make it up the mountain to interview you?

  3.  
    September 16, 2015 | 10:46 PM
     

    Actually, I’ve moved to the North Georgia Mountains. My local paper is a weekly and extremely local.

  4.  
    Bernard Carroll
    September 16, 2015 | 11:59 PM
     

    Looking good, Dr. Mickey. Congrats. That was one heck of an effort. Loved your graphs and tables, too – each worth a thousand words.

  5.  
    September 17, 2015 | 1:30 AM
     

    congrats old man ๐Ÿ˜‰

  6.  
    September 17, 2015 | 9:08 AM
     

    Stellar work from all involved, great to see the restored study make headline news in so many news outlets across the globe… well done ๐Ÿ™‚

  7.  
    Fiachra
    September 17, 2015 | 6:26 PM
     

    Hi 1boring old man,
    It sounds like these drugs are no good for depression, but that they can do a lot of harm.
    I think that this is probably the same for all SSRIs (why distinguish).

    As regards the tricyclic I’ve been on one of those (not imipramine) years ago, and it made no difference at all to my mood (going on it, being on it, or coming off it).

    I think that the Public has been completed taken for a ride.

Sorry, the comment form is closed at this time.