Randomized Double-Blind Placebo-Controlled Clinical Trials are nothing new. They’ve been around for a long time and there are conventions about how they’re conducted and reported. The outcome parameter is measured at intervals over the duration of the study, then the results for the placebo and the treatment arms are displayed graphically [or sometimes in a table], usually with some method to deal with drop-outs [LOCF or the Mixed Model], and the results of placebo vs treatment are tested statistically at each interval. Often, they compare predefined criteria driven response and/or remission rates. Some studies also calculate effect size, number needed to treat, or the odds ratio as a measure of the strength of the drug effect. There’s one more up-front calculation of note. The Clinical Trial mavens can estimate how many subjects will be needed to prove their point one way or the other – called the power of the trial.
So, back to Paxil Study 352. Paxil Study 352 is unique [paxil study 352 revisited…] in that there is neither graph nor table showing the data over time. In fact, the whole article follows few of the conventions of Clinical Trial reporting – visible from across the room: one table [spread over two pages] and one gratuitous graph [Lithium Levels?]:
So what about displayed graphically, tested statistically, response and/or remission rates, effect size, number needed to treat, odds ratio…
During the 10-week study period, patients were assessed for both efficacy and adverse events at baseline and at weeks 1–6, 8, and 10.
Mean changes in score on the Hamilton depression scale and CGI severity of illness scale from baseline to endpoint for the paroxetine and imipramine groups were not significantly different than those of the placebo treated group [Table 1].
For the total intent-to-treat population, there were no statistically significant differences in response rates among those receiving paroxetine, imipramine, or placebo (per Hamilton criterion: 45.5% [N=15], 38.9%, [N=14], and 34.9% [N=15], respectively; per CGI criterion: 54.5% [N= 18], 58.3% [N=21]; and 46.5% [N=20]). Among the study completers, Hamilton depression scale scores ≤7 were achieved by 56.0% (N=14 of 25) of the paroxetine-treated patients, 47.8% (N=11 of 23) of the imipramine-treated patients, and 53.8% (N=14 of 26) of the placebo-treated patients. Similarly, CGI global improvement scores ≤2 were achieved by 68.0% (N=17) of the paroxetine-treated patients, 73.9% (N=17) of the imipramine-treated patients, and 69.2% (N=18) of the placebo-treated patients.
So here we are at the midway point. What we have is a negative study with no display of the scores vs time, an odd [but not significant] primary outcome variable, an insignificant secondary outcome variable [response]. The strength of the effect is not calculated because there’s no effect to calculate the strength for. Looks to me like if you’re Bipolar and stabilized on Lithium, if you get depressed, neither Imipramine nor Paroxetine is right for you.
|
The group was stratified on the basis of serum lithium level at the screening examination (high: >0.8 meq/liter, low: ≤0.8 meq/liter). Lithium stratification criteria were determined a priori. The proportion of patients achieving dichotomous response was analyzed by the Cochran-Mantel-Haenszel test adjusting for lithium stratification or by Fisher’s exact test. The chi-square test was used for analyses within lithium strata.
Among patients with high serum lithium levels, similar response rates were noted among those receiving paroxetine, imipramine, or placebo (per Hamilton criterion: 35.7% [N=5], 41.2%, [N=7], and 38.1% [N=8], respectively; per CGI criterion: 57.1% [N=8], 47.1% [N=8]; and 52.4% [N=11]). For those with low serum lithium levels, no statistically significant differences in response rates were seen among those receiving paroxetine, imipramine, or placebo (per Hamilton criterion: 52.6% [N=10], 36.8%, [N=7], and 31.8% [N=7], respectively; per CGI criterion: 52.6% [N=10], 68.4% [N=13]; and 40.9% [N=9]).
The general linear model procedure of SAS (Cary, N.C.) was used to perform the analysis with a model that included effects for treatment and lithium strata for scores on the Hamilton depression scale (first 17 items) and CGI severity of illness scale… The treatment-by-lithium strata interaction was found to be nonsignificant and was not included in the model.
Because all other statistical comparisons were considered to be secondary, no adjustments for multiple comparisons were made. Therefore, the achievement of statistical significance for the primary efficacy variables at endpoint (i.e., changes from baseline in scores on the Hamilton depression scale and CGI severity of illness scale) was set at p≤0.05.
The paroxetine 352 bipolar trial: A study in medical ghostwriting
by Jay D. Amsterdam and Leemon B. McHenry
International Journal of Risk & Safety in Medicine 2012 24:221–231.
The original protocol sample size estimate of 0.9 (1-β) or 62 subjects per treatment group was officially amended downward to 0.8 (1-β) or 46 subjects per group during the study. The latter value was the sample size described in the GSK Clinical Trials Website Result Summary. No explanation was provided for this change in sample size in the amended protocol. However, we suspect that this reduction in power might have resulted from exceedingly slow subject recruitment into the study, which ultimately led GSK to add a 19th investigative site. By the time GSK decided to halt subject enrollment prematurely and terminate the study, only 117 (of the originally projected 186 subjects) were enrolled, resulting in final sample sizes for paroxetine (n = 35), imipramine (n = 39), and placebo (n = 43). By the time the study was published in June 2001 in the American Journal of Psychiatry, however, the declared sample size estimate had again changed with the article stating: “The study was designed (sic) to enroll 35 patients per arm, which would allow 70% power to detect a 5-point difference on the Hamilton depression scale score (SD = 8.5) between treatment groups”.
Although the published article noted that statistical power was estimated at only 70%, the article did not inform the reader that this value represented an unconventionally low power for a clinical trial. The article did not inform the reader that the original power estimate was 62 subjects per group or that the original power estimate had been officially reduced during the study. Moreover, the article made no mention of the fact that the final power estimate was determined after the study was completed, and that this post hoc power estimate most likely occurred as an ‘extra-regulatory’ protocol change in order to allow the final sample size estimate of 35 subjects per group to comport with the final sample size of the paroxetine group (i.e., n = 35). The published article failed to acknowledge clearly that the study failed to recruit the projected sample size necessary to test the primary study hypothesis, and only hinted by its published sample size estimate that the study had insufficient statistical power to test the primary study aims.
Sorry, the comment form is closed at this time.