paxil study 352 revisited…

Posted on Sunday 25 November 2012

This is going to be a long post because I couldn’t figure how to parse it into chunks. It’s about Paxil Study 352 mentioned before [go pogo…, closer to becoming indelible…, and recently back on the front burner…]. This first quote is the Study write-up dated 2004 and is as close to the protocol as I can seem to get. The tables that follow are my rendering of the tables in the GSK document for clarity:
Study No.: PAR 352
Title: A Double-Blind, Placebo-Controlled, Comparison of Imipramine and Paroxetine in the Treatment of Bipolar Depression.
Rationale: The treatment of bipolar depression is a complex and controversial issue. While lithium therapy has been employed as the primary management of bipolar depression, 40-50% of those treated remain unresponsive. The failure of lithium as monotherapy in managing bipolar depression often leads physicians to consider combination therapy with tricyclic antidepressants. Paroxetine, a phenylpiperidine compound, is a specific inhibitor of serotonin reuptake. This compound is approved in some countries for use in the treatment of Major Depressive Disorder, Obsessive Compulsive Disorder, and Panic Disorder. The safety profile of selective serotonin reuptake inhibitors (SSRIs) offers an advantage compared to tricyclics and monoamine oxidase inhibitors (MAOIs). In addition, the lack of anticholinergic effects and dietary restrictions may increase the attractiveness of these agents to physicians and enhance subject compliance. Previous studies have demonstrated paroxetine’s antidepressant efficacy, yet there are no trials that specifically investigate its use in bipolar affective disorder. This trial offers the opportunity to compare the efficacy and safety of paroxetine and imipramine in subjects with bipolar depression who are stabilized on lithium therapy.
Phase: Phase IV
Study Period: 8 February 1994 to 29 March 1996
Study Design: This was a 10-week, multicentre, placebo- controlled, double blind, parallel group study.
Centres: 18 centres in the United States
Indication: Bipolar Disorder
Treatment: Following a one-week placebo (PBO) run-in period, subjects were randomized on a 1:1:1 basis to paroxetine (20-50mg/day), imipramine (50-300mg/day) or PBO for a 10-treatment period. The study also included a Taper End and Follow-up visit. Subjects randomized to the paroxetine group received 20mg/day for Weeks 1-3. Starting at Week 4 until Week 6, 10mg dosage increments were allowed every seven days up to a maximum dose of 50 mg/day, depending on therapeutic response. Subjects randomized to the imipramine group received as forced titration 50mg/day on Week 1, 100mg/day during Week 2, and 150mg/day during Week 3. Beginning with Week 4, 50mg/day dosage increments were allowed every seven days up to a maximum dose of 300 mg/day, depending on therapeutic response. Identical placebo tablets to active drug were supplied to maintain the blind. Objectives: The objective of this study was to compare the efficacy and safety of paroxetine and imipramine to PBO in the treatment of bipolar depression in subjects stabilized on lithium therapy.
Primary Outcome/Efficacy Variable: The primary efficacy variables were change from baseline in the Hamilton Rating Scale for Depression (HAMD) total score (first 17 items) and the change from baseline in the Clinical Global Impression Severity of Illness (CGI-S). The primary timepoint of interest for all efficacy assessments was each subject’s last available on-therapy observation.
Secondary Outcome/Efficacy Variable(s): The secondary efficacy variables were the proportion of subjects responding defined as a score of ≤7 on the HAMD, and the percentage of subjects with a CGI Global Improvement score of ≤2…
Things to notice. There were significant drop-outs along the way, 74 out of 117 [dropout rate 37%]. Also note that in the Primary Efficacy variable HAMD-17, the Baseline n and the Endpoint n are the same, meaning that the corrected for drop-outs using the LOCF [Last Observation Carried Forward] method. So the Change from Baseline Mean uses the corrected values for the Mean, Standard Error, and the p values. Now, here is the abstract for the paper that was published in 2001:
Double-blind, placebo-controlled comparison of imipramine and paroxetine in the treatment of bipolar depression.
by Nemeroff CB, Evans DL, Gyulai L, Sachs GS, Bowden CL, Gergel IP, Oakes R, and Pitts CD.
American Journal of Psychiatry. 2001 158[6]:906-912.

OBJECTIVE: This study compared the efficacy and safety of paroxetine and imipramine with that of placebo in the treatment of bipolar depression in adult outpatients stabilized on a regimen of lithium.
METHOD: In a double-blind, placebo-controlled study, 117 outpatients with DSM-III-R bipolar disorder, depressive phase, were randomly assigned to treatment with paroxetine (N=35), imipramine (N=39), or placebo (N=43) for 10 weeks. In addition to lithium monotherapy, patients may have received either carbamazepine or valproate in combination with lithium for control of manic symptoms. Patients were stratified on the basis of trough serum lithium levels determined at the screening visit (high: >0.8 meq/liter; low: <0.8 meq/liter). Primary efficacy was assessed by change from baseline in scores on the Hamilton Rating Scale for Depression and the Clinical Global Impression illness severity scale.
RESULTS: Differences in overall efficacy among the three groups were not statistically significant. For patients with high serum lithium levels, antidepressant response at endpoint also did not significantly differ from placebo. However, both paroxetine and imipramine were superior to placebo for patients with low serum lithium levels. Compared to imipramine, paroxetine resulted in a lower incidence of adverse events, most notably emergence of manic symptoms.
CONCLUSIONS: Antidepressants may not be useful adjunctive therapy for bipolar depressed patients with high serum lithium levels. However, antidepressant therapy may be beneficial for patients who cannot tolerate high serum lithium levels or who have symptoms that are refractory to the antidepressant effects of lithium.
With this table as it’s sole meaningful report on the data, Note that they separated the data based on the Serum Lithium levels at screening:
The table below is from the Study Report on their web-site and the source of the published table. The cells with the green background were significant [by a hair]:
They did the same thing with the response data, and found no significance:
Note: My tables show only the HAMD results. The CGI results are available and parallel the HAMD. Just saving space. This post is too long already!

At first glance, there’s something wrong with this Clinical Trial article. Where are the graphs? Standard practice is to display the changes over time for the duration of the study. They are nowhere found. Did they only measure the HAMD and CGI at the baseline and endpoint? That’s unlikely since they saw the subjects along the way to adjust medications and draw lithium levels [0, 2, 4, 6, & 10 weeks]. Even more compelling, they had a "last observation" value for all drop-outs. So one has to conclude they had serial measurements of the outcome parameters, but that they’re not in the article.

Note that in the published Table 1, there is no mention of LOCF [Last Observation Carried Forward], nor is it mentioned in the paper, yet it is clear in the data summary [published 3 years later]. In fact, Table 1 says "after ten weeks", but there were significant drop-outs [37%] not discussed in the article except later in the adverse events. The only indicator of how drop-outs were handled is in the Data Analysis section:
"Data are presented from the intent-to-treat population. The endpoint data set was the primary time point of interest and was determined for each patient from the last available observation while receiving treatment."
Since the only significant finding reported is the poor performance of the low serum lithium placebo group, the size of that group matters, but like the drop-out data, it’s not there nor can it be inferred since all values are expressed in terms of the "intent-to-treat" population irrespective of the drop-outs. And then there’s this:
"The primary comparison of interest was between the paroxetine and placebo treatment groups regardless of lithium stratification. Because all other statistical comparisons were considered to be secondary, no adjustments for multiple comparisons were made. Therefore, the achievement of statistical significance for the primary efficacy variables at endpoint (i.e., changes from baseline in scores on the Hamilton depression scale and CGI severity of illness scale) was set at p≤0.05."
A standard check to adjust statistics for multiple comparisons is the Bonferroni correction [multiple comparisons like the artificial separation of their results by lithium stratification], but they say that they don’t need to correct their results because it wasn’t their "primary comparison of interest" [a silly rationalization]. I suspect that a more likely reason is that the correction would’ve meant that their only significance would’ve evaporated. But, they concluded:
"Antidepressants may not be useful adjunctive therapy for bipolar depressed patients with high serum lithium levels. However, antidepressant therapy may be beneficial for patients who cannot tolerate high serum lithium levels or who have symptoms that are refractory to the antidepressant effects of lithium."
Even with the artificial post hoc separation of their results by lithium stratification, the Response Rates [HAMD-17 <7 or CGI Improvement <2] were not significant. The only evidence was the change in HAMD-17 values, in question because of the absent n and the failure to correct for multiple comparisons. But there was something else:
"All statistical tests were two-tailed. Tests of hypothesis of interactions were made at the 10% significance level, and all other tests were made at the 5% significance level. Data are presented as means and standard deviations. The CONTRAST statement from the general linear model procedure of SAS was used for treatment group comparisons. Interaction assessments were conducted as per protocol. However, significant interactions were not found and therefore not presented."
They concluded that there was an interaction between the serum lithium level and the antidepressant response, yet the statistic used to evaluate that kind of interaction was not significant, so they didn’t show it. No surprise that they didn’t show it – it invalidated their conclusion.

There’s more data gaming in the paper, but this is enough to make the point that this paper demonstrates a consistent and persistent pattern of deceit in analysis and presentation. The separation by lithium stratification was made after the fact. I recently learned this was called HARKing [Hypothesis After Results Known]. There were beyond significant levels of data gone missing, cherry-picking of results, deceitful data presentation and analysis, all leading to unsupportable conclusions. It’s impossible to avoid the conclusion that all of this was deliberate – actively misusing the tools of science, presentation, and analysis to create a positive outcome in a decidedly negative study. Add to that, the article was ghost-written by Scientific Therapeutics Information on contract with the drug’s pharmaceutical company GSK, who also chose the first author, Dr. Charles Nemeroff, based on name recognition, not participation in the actual authorship of the paper.

It may seem like nit-picking to continue to try to expose the egregious misdeeds in the scientific literature in psychiatry in the past. After all, Study 352 was conducted 18 years ago [1994] and published 11 years ago [2001]. But thinking that the sins of the past aren’t relevant ["Let sleeping dogs lie"] is a delusion. Something terrible happened that discredited our literature, our specialty, and affected how psychiatrists practiced and patients were treated. It can still happen in the present or tomorrow. There’s more awareness now, but the participating KOLs, the drug companies, the drugs involved and the pressure to prescribe them, the medical writing firms, and the Clinical Research Organizations haven’t gone anywhere. The efforts to bring what happened in Study 352 into the light of day have thus far gone nowhere…
    Bernard Carroll
    November 25, 2012 | 1:32 PM

    I recall smelling a rat when this report appeared in American Journal of Psychiatry in 2001 with Nemeroff as lead author. That intuition has been borne out. We now know that Nemeroff did little of the heavy lifting to execute Glaxo study 352 and that his promotion to lead author was an utterly cynical act. In cahoots with Sally Laden and STI, Nemeroff honed a style of deceit many would call being economical with the truth. In their infomercials talking up corporate experimercials, strategic omissions were an important part of their modus operandi. These are not hard to identify throughout Nemeroff’s published work. Let’s hope the ongoing complaint to Office of Research Integrity at NIH goes somewhere.

    November 25, 2012 | 10:52 PM

    See Molnar, 2008
    “One such method is “last observation carried forward.” This technique replaces a participant’s missing values after dropout with the last available measurement and assumes that the participant’s responses (e.g., outcome measures) would have been stable from the point of dropout to trial completion, rather than declining or improving further. It also assumes that missing values are “missing completely at random” (i.e., that the probability of dropout is not related to variables such as disease severity, symptoms, group assignment or drug side effects).”

    LOF always seemed like a very questionable tactic. The subject dropped out — clearly there was a decline from LOF — and if it was due to adverse effects, etc., this should be calculated as a negative for efficacy.

    Mark Kramer
    November 26, 2012 | 5:59 PM

    What a farce.

Sorry, the comment form is closed at this time.