Well our second Paxil Study 329 paper was published at the end of last week. I waited to mention it here until David Healy had a post about it – out today [see Study 329 Continuation Phase]. We originally submitted it to the Journal of the American Academy of Child and Adolescent Psychiatry who turned it down [their peer review comments are on our website Restoring Study 329 – interesting in their own right]. I think what I’ll do is show a couple of graphs from that data, then reverse my usual m.o. by talking about it first and ending with the abstract:
Paxil Study 329 had a Continuation Phase where they followed the responders only, blinded on the same meds for six months. In the a priori Protocol, it was a Secondary Outcome Variable hoping to measure the relapse rate. They didn’t mention it in Keller et al. I think they must’ve looked at that upper graph of the drop-out rate and shied away from the Continuation Phase altogether. The lower graph has the Raw HAM-D scores and, as expected, they showed no differences. But we never said that this was a badly designed study. To the contrary, it’s better than most and this six month follow-up data is about the only longer term SSRI dataset around, certainly in kids – so we decided to take a look.
Hypothesis Testing Research vs Material-Exploration
Scientific research and reasoning continually pass through the phases of the well-known empirical-scientific cycle of thought: observation – induction – deduction – testing [observe – guess – predict – check]. The use of statistical tests is of course first and foremost suited for “testing”, i.e., the fourth phase. In this phase one assesses whether certain consequences [predictions], derived from one or more precisely postulated hypotheses, come to pass. It is essential that these hypotheses have been precisely formulated and that the details of the testing procedure [which should be as objective as possible] have been registered in advance. This style of research, characteristic for the [third and] fourth phase of the cycle, we call hypothesis testing research.
This should be distinguished from a different type of research, which is common especially in [Dutch] psychology and which sometimes also uses statistical tests, namely material-exploration. Although assumptions and hypotheses, or at least expectations about the associations that may be present in the data, play a role here as well, the material has not been obtained specifically and has not been processed specifically as concerns the testing of one or more hypotheses that have been precisely postulated in advance. Instead, the attitude of the researcher is: “This is interesting material; let us see what we can find.” With this attitude one tries to trace associations [e.g., validities]; possible differences between subgroups, and the like. The general intention, i.e. the research topic, was probably determined beforehand, but applicable processing steps are in many respects subject to ad hoc decisions. Perhaps qualitative data are judged, categorized, coded, and perhaps scaled; differences between classes are decided upon “as suitable as possible”; perhaps different scoring methods are tried along-side each other; and also the selection of the associations that are researched and tested for significance happens partly ad-hoc, depending on whether “something appears to be there”, connected to the interpretation or extension of data that have already been processed.
When we pit the two types so sharply against each other it is not difficult to see that the second type has a character completely different from the first: it does not so much serve the testing of hypotheses as it serves hypothesis-generation, perhaps theory-generation — or perhaps only the interpretation of the available material itself…
If you only take one thing away from this entire 1boringoldman blog, let this be it. What’s been wrong with the clinical trial literature is that the papers are written as if they are some kind of anything-goes, free-wheeling, Material Explorations with changing outcomes, creative statistics, and speculations-presented-as-facts. That’s dead wrong. They are Hypothesis Testing enterprises that require every bit of the rigor and attention to protocol described by de Groot. Product Testing exercises, not Exploratory Research! Hypothesis Testing not Material-Exploration! …End of Sermon…
Now back to our Paxil Study 329 Continuation Phase paper. I’m not even going to try to summarize it because fellow author David Healy has done such a good job in Study 329 Continuation Phase. He and Jo Le Noury have a collective knack for looking at adverse event data. We did find some things after all, in spite of the drop-out rate – primarily by looking closely at the timing and various states of medication use. So look over the paper and be sure to read David’s posts, the one today and the one coming next week, for the details of what we found. Some pretty interesting Material Explorations in my book. Here’s another graphic and the abstract:
by Le Noury, Joanna; Nardo, John M; Healy, David; Jureidini, Jon; Raven, Melissa; Tufanaru, Catalin; and Abi-Jaoude, Elia.
International Journal of Risk & Safety in Medicine. 2016 28[3]:143-161.
OBJECTIVE: This is an analysis of the unpublished continuation phase of Study 329, the primary objective of which was to compare the efficacy and safety of paroxetine and imipramine with placebo in the treatment of adolescents with unipolar major depression. The objectives of the continuation phase were to assess safety and relapse rates in the longer term. The objective of this publication, under the Restoring Invisible and Abandoned Trials [RIAT] initiative, was to see whether access to and analysis of the previously unpublished dataset from the continuation phase of this randomized controlled trial would have clinically relevant implications for evidence-based medicine.
METHODS: The study was an eight-week double-blind randomized placebo-controlled trial with a six month continuation phase. The setting was 12 North American academic psychiatry centres, from 20 April 1994 to 15 February 1998. 275 adolescents with major depression were originally enrolled in Study 329, with 190 completing the eight-week acute phase. Of these, 119 patients [43%] entered the six-month continuation phase [paroxetine n=49; imipramine n=39; placebo n=31], in which participants were continued on their current treatment, blinded. As per the protocol, we have looked at rates of relapse [based on Hamilton Depression Scale scores] across both acute and continuation phases, and generated a safety profile for paroxetine and imipramine compared with placebo for up to six months. ANOVA testing [generalized linear model] using a model including effects of site, treatment and site x treatment interaction was applied. Otherwise we used only descriptive statistics.
RESULTS: Of patients entering the continuation phase, 15 of 49 for paroxetine [31%], 12 of 39 for imipramine [31%] and 12 of 31 for placebo [39%] completed as responders. Across the study, 25 patients on paroxetine relapsed [41% of those showing an initial response], 15 on imipramine [26%], and 10 on placebo [21%]. In the continuation and taper phases combined there were 211 adverse events in the paroxetine group, 147 on imipramine and 100 on placebo. The taper phase had a higher proportion of severe adverse events per week of exposure than the acute phase, with the continuation phase having the fewest events.
CONCLUSIONS: The continuation phase did not offer support for longer-term efficacy of either paroxetine or imipramine. Relapse and adverse events on both active drugs open up the risks of a prescribing cascade. The previously largely unrecognised hazards of the taper phase have implications for prescribing practice and need further exploration.