study 329 x – “it wasn’t sin – it was spin”…

Posted on Thursday 17 September 2015

[Note: the Press coverage of our article is on study329.org, but I wanted to mention the article on Retraction Watch because it has Dr. Martin Keller’s response to our paper with an argument similar to the one below…]

We know from this internal memo and position piece that the initial SKB interpretation of the efficacy results from Study 329 mirrored those reported in our RIAT article:
14 OCT 1998

Please find attached to this memo a position piece, prepared by Julie Wilson of CMAT, summarising the results of the clinical studies in Adolescent Depression.

As you will know, the results of the studies were disappointing in that we did not reach statistical significance on the primary end points and thus the data do not support a label claim for the treatment of Adolescent Depression. The possibility of obtaining a safety statement from this data was considered but rejected. The best which could have been achieved was a statement that, although safety data, was reassuring, efficacy had not been demonstrated. Consultation of the Marketing Teams via Regulatory confirmed that this would be unacceptable commercially and the decision to take no regulatory action was recently endorsed by the TAT.

As you will see from the position piece the positive trends In efficacy which were seen in Study 329 are being published as a poster at ECNP this year and a full manuscript is in development. Published references will therefore be available for the study. There are no plans to publish data from Study 377.

This report has been prepared for internal use only. Data on File summaries will be prepared and issued once the final reports from the studies have been approved. This position piece will also be available on the Seroxat/Paxil resource database.

TARGET [from the Wilson position piece mentioned above]
To effectively manage the dissemination of these data in order to minimize any potential negative commercial impact
This was, indeed, a negative study, though the published article reached the opposite conclusion [2001]:
Paroxetine is generally well tolerated and effective for major depression in adolescents.
Three  years ago, when I reviewed the exchange between Healthy Skepticism and the editor of the publishing Journal of the Academy of Child and Adolescent Psychiatry [see the lesson of Study 329: naked Emperors, fractious Queens…], I left out parts of the author’s response to the letter from Jureidini and Tonkin [2003]. This is where they attempt to explain "why" they felt justified in using the non-protocol outcomes:
This study was designed at a time when there were no randomized controlled trials showing antidepressant [tricyclic antidepressant or SSRI] superiority to placebo, so we had no prior data from which to astutely pick our outcome measures. The field has moved strongly away from using the Hamilton Rating Scale for Depression [HAM-D] in adolescent treatment studies and has gone virtually uniformly to using the Children’s Depression Rating Scale-Revised because the latter better and more reliably captures aspects of depression in youth. Surely a national regulatory body charged with approving or not approving a medication for a particular use might well simply say that if a study does not show efficacy on the primary endpoint[s[, it is a failed study and secondary outcome measures cannot then be used for approval. However, as scientists and clinicians we must adjudge whether or not the study overall found evidence of efficacy, and we do not have the convenience of falling back on such a simple rule. If we choose wrongly [in whichever direction], we don’t treat depressed children as well as the data would permit. Because we found a clear pattern of significant p values across multiple secondary analyses [recovery as assessed by HAM-D < 8, HAM-D depressed mood item, the Schedule for Affective Disorders and Schizophrenia for School-Age Children depression item, and Clinical Global Impression score at endpoint], we thought and still think this provides significant evidence of efficacy of paroxetine compared with placebo in adolescent depression. Without established reliable measures that distinguish medication responders from nonresponders at the time the study was designed, it is not surprising that the primary measures did not reach significance while other measures did. It still provides a strong “signal” for efficacy…
Creative! I expect that the comments about the CDRS-R [Children’s Depression Rating Scale-Revised] are in the vicinity of reasonable. One wonders why they didn’t say this in the first place in either the article or the Clinical Study Report. But if you take a look at several previous posts [paxil in adolescents: “five easy pieces”…, an addendum…, and follow-up…], you’ll see a definitive counter to this creative, latter day response [also apparent in this timeline]:
At the time the 329 authors wrote their response to Jon Jurieidini and Ann Tonkin in May 2003, SKB [GSK] had already completed two other Clinical Trials of Paxil in adolescents – one of them actually using the CDRS-R as a primary outcome variable. Those two studies were eventually published [after the patent for Paxil expired], but they were conducted much earlier and SKB [GSK] had the results [top figure]. When they used the CDRS, Placebo actually beat Paxil [bottom figure in yellow]. So at the time of that authors’ response letter, they justified what they’d said in Study 329 with an argument they’d already tested and already knew was a dead end [Study 701]:

using MADRS:
by Ray Berard, Regan Fong, David J. Carpenter, Christine Thomason, and Christel Wilkinson
Journal of Child and Adolescent Psychopharmacology. 2006 16[1-2]:59–75.
Conclusions: No statistically significant differences were observed for paroxetine compared with placebo on the two prospectively defined primary efficacy variables. Paroxetine at 20–40 mg/day administered over a period of up to 12 weeks was generally well tolerated.

using CDRS-R:
by GRAHAM J. EMSLIE, KAREN DINEEN WAGNER, STAN KUTCHER, STAN KRULEWICZ, REGAN FONG, DAVID J. CARPENTER, ALAN LIPSCHITZ, ANDREA MACHIN, AND CHRISTEL WILKINSON
Journal of the American Academy of Child and Adolescent Psychiatry. 2006 45[6]:709-719.
Conclusions: Paroxetine was not shown to be more efficacious than placebo for treating pediatric major depressive disorder.
It may seem an odd way to end this particular run-on series of blog posts using a paragraph from a letter now over a decade old. But in study 329 vi: revisited…, I said, "the erroneous conclusion in Keller et al can hardly be chalked up to a mistake. It shows too many tell-tale signs of intention." That’s an opinion, my strong opinion, and I wanted to back it up with an example that didn’t just come from our reanalysis. In the very first real challenge to the article back in their 2003 letter to the JAACAP, Jon Jureidini and Ann Tonkin of Healthy Skepticism clearly saw what it has taken fourteen years of dogged persistence to finally insert into the literature in the form of our RIAT article [see the lesson of Study 329: naked Emperors, fractious Queens…]:
The article by Keller et al. [2001] is one of only two to date to show a positive response to selective serotonin reuptake inhibitors [SSRIs] in child or adolescent depression. We believe that the Keller et al. study shows evidence of distorted and unbalanced reporting that seems to have evaded the scrutiny of your editorial process. The study authors designated two primary outcome measures: change from baseline in the Hamilton Rating Scale for Depression [HAM-D] and response [set as fall in HAM-D below 8 or by 50%]. On neither of these measures did paroxetine differ significantly from placebo. Table 2 of the Keller article demonstrates that all three groups had similar changes in HAM-D total score and that the clinical significance of any differences between them would be questionable. Nowhere is this acknowledged. Instead:
  1. The definition of response is changed. As defined in the “Method” section, it has a nonsignificant p value of .11. In the “Results” section [without any explanation], the criterion for response is changed to reduction of HAM-D to below 8 [with a p value of .02]. By altering the criterion for the categorical measure of outcome, the authors are able to claim significance on a primary outcome measure.
  2. In reporting efficacy results, only “response” is indicated as a primary outcome measure, and it could be misunderstood that response was the primary outcome measure. Only in the discussion is it revealed that “Paroxetine did not separate statistically from placebo for…HAM-D total score,” without any acknowledgment that total score was one of the two primary outcome measures. The next sentence is a claim to have demonstrated efficacy for paroxetine.
Thus a study that did not show significant improvement on either of two primary outcome measures is reported as demonstrating efficacy. Given that the research was paid for by Glaxo-Smith-Klein, the makers of paroxetine, it is tempting to explain the mode of reporting as an attempt to show the drug in the most favorable light. Given the frequency with which it is cited in other scientific papers, at conferences and educational functions, and in advertising, this article may have contributed to the increased prescribing of SSRI medication to children and adolescents. We believe it is a matter of importance to public health that you acknowledge the failings of this article, so that its findings can be more realistically appraised in decision-making about the use of SSRIs in children.
With a careful reading, they saw through to the essence of what was wrong without the benefit of any of the back story, the raw data, or the numerous analyses that have followed over the years about this study. It’s a great example for all of us to emulate. Being a doctor is hard work by any standard, and we feel good about putting in all the extra time it takes to stay current. I doubt there’s any profession that can claim the "life-long-learning" moniker any more than we can. You never really graduate from medical school and there’s a never ending series of tests [AKA patients] as long as you’re in the game. So we get used to scanning, reading non-critically, in part because of the volume. But every one of us needs to learn how to recognize the signs that a given article needs to be read like Jon and Ann read this one. The modern industry sponsored Clinical Trial literature in all of medicine is filled with articles that need a long second look. Without thinking, I coined a phrase answering a reporter’s questions about our paper, "it wasn’t sin – it was spin." In the political arena, they call it plausible deniability. I don’t really believe it wasn’t sin [it may be the biggest sin of all because it’s the kind people get away with]. But the phrase still conveys a useful diagnostic take-home message to remind us what we’re on the lookout for…
  1.  
    Bernard Carroll
    September 18, 2015 | 1:16 PM
     

    What difference does it make whether the failures of reporting in the 2001 Keller paper resulted from deceit or from abysmal poor tradecraft? Either way, they have lost our trust in their scientific credentials.

  2.  
    Tessa
    September 21, 2015 | 11:11 PM
     

    It was both spin and sin. The Bible says that the love of money is the root of all evil.
    Congratulations to the team that finally brought the truth.

Sorry, the comment form is closed at this time.