an innovative design…

Posted on Friday 18 September 2015

It has been quite a week, so I haven’t had much else on my mind outside of our own publication [Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence], but I ran across this paper and thought it was pretty interesting – focused on a topic that’s close to what we were writing about:
PLoS Medicine
by Yasmina Molero, Paul Lichtenstein, Johan Zetterqvist, Clara Hellner Gumpert, and Seena Fazel
September 15, 2015

Background: Although selective serotonin reuptake inhibitors [SSRIs] are widely prescribed, associations with violence are uncertain.
Methods and Findings: From Swedish national registers we extracted information on 856,493 individuals who were prescribed SSRIs, and subsequent violent crimes during 2006 through 2009. We used stratified Cox regression analyses to compare the rate of violent crime while individuals were prescribed these medications with the rate in the same individuals while not receiving medication. Adjustments were made for other psychotropic medications. Information on all medications was extracted from the Swedish Prescribed Drug Register, with complete national data on all dispensed medications. Information on violent crime convictions was extracted from the Swedish national crime register. Using within-individual models, there was an overall association between SSRIs and violent crime convictions [hazard ratio [HR] = 1.19, 95% CI 1.08–1.32, p < 0.001, absolute risk = 1.0%]. With age stratification, there was a significant association between SSRIs and violent crime convictions for individuals aged 15 to 24 y [HR = 1.43, 95% CI 1.19–1.73, p < 0.001, absolute risk = 3.0%]. However, there were no significant associations in those aged 25–34 y [HR = 1.20, 95% CI 0.95–1.52, p = 0.125, absolute risk = 1.6%], in those aged 35–44 y [HR = 1.06, 95% CI 0.83–1.35, p = 0.666, absolute risk = 1.2%], or in those aged 45 y or older [HR = 1.07, 95% CI 0.84–1.35, p = 0.594, absolute risk = 0.3%]. Associations in those aged 15 to 24 y were also found for violent crime arrests with preliminary investigations [HR = 1.28, 95% CI 1.16–1.41, p < 0.001], non-violent crime convictions [HR = 1.22, 95% CI 1.10–1.34, p < 0.001], non-violent crime arrests [HR = 1.13, 95% CI 1.07–1.20, p < 0.001], non-fatal injuries from accidents [HR = 1.29, 95% CI 1.22–1.36, p < 0.001], and emergency inpatient or outpatient treatment for alcohol intoxication or misuse [HR = 1.98, 95% CI 1.76–2.21, p < 0.001]. With age and sex stratification, there was a significant association between SSRIs and violent crime convictions for males aged 15 to 24 y [HR = 1.40, 95% CI 1.13–1.73, p = 0.002] and females aged 15 to 24 y [HR = 1.75, 95% CI 1.08–2.84, p = 0.023]. However, there were no significant associations in those aged 25 y or older. One important limitation is that we were unable to fully account for time-varying factors.
Conclusions: The association between SSRIs and violent crime convictions and violent crime arrests varied by age group. The increased risk we found in young people needs validation in other studies.


[cropped to fit the space]
Scandanavia has always been a special place for medical epidemiology. The countries are isolated, self contained, and they have centralized and detailed record keeping going back to the dawn of time. If you’re looking for twins adopted to different families at birth to look at nature vs nurture, head for Scandanavia. This is one of those studies – a Sweden-wide three year look at the relationship between taking SSRIs and violence. But there’s something more. They used the subjects themselves as their own controls [which struck me as a really bright thought].

Besides having access a cohort of 8+M people [~10% on SSRIs] with their prescription records and the public records of every brush with the law, they had some mighty fine computers and statisticians to have extracted their data and cross-checked so many covariates. I couldn’t possibly"vet" all of their analyses. But the core thread is that they isolated periods when patients were "on" SSRIs and when they were "off" the medication, and they compared the arrest rates for violent crimes "on" and "off" – deriving a Hazard Ratio.

While the paper deserves a careful reading, it feels like they’ve done their due diligence. There have been a ton of papers that have tried to debunk the black box warning of aggressive behavior in some adolescents on these SSRIs – and many of them focus on population studies:

  1. Gibbons RD, Hur K, Bhaumik DK, Mann JJ.
    Arch Gen Psychiatry. 2005 Feb;62(2):165-72.
  2. Gibbons RD, Hur K, Bhaumik DK, Mann JJ.
    Am J Psychiatry. 2006 Nov;163(11):1898-904.
  3. Charles B. Nemeroff, Amir Kalali, Martin B. Keller, Dennis S. Charney, Susan E. Lenderts, Elisa F. Cascade, Hugo Stephenson, and Alan F. Schatzberg
    Arch Gen Psychiatry. 2007 Apr;64(4):466-472.
  4. Nakagawa A, Grunebaum MF, Ellis SP, Oquendo MA, Kashima H, Gibbons RD, Mann JJ.
    J Clin Psychiatry. 2007 Jun;68(6):908-916.
  5. Benji T. Kurian, MD, MPH; Wayne A. Ray, PhD; Patrick G. Arbogast, PhD; D. Catherine Fuchs, MD; Judith A. Dudley, BS; William O. Cooper, MD, MPH
    JAMA: Pediatrics. 2007 Jun;161(7):690-696.
  6. Gibbons RD, Brown CH, Hur K, Marcus SM, Bhaumik DK, Mann JJ.
    Am J Psychiatry. 2007 Jul;164(7):1044-1049.
  7. Gibbons RD, Brown CH, Hur K, Marcus SM, Bhaumik DK, Erkens JA, Herings RM, Mann JJ.
    Am J Psychiatry. 2007 Sep;164(9):1356-1363.
  8. Brown CH, Wyman PA, Brinales JM, Gibbons RD.
    Int Rev Psychiatry. 2007 Dec;19(6):617-631.
  9. Gibbons RD, Segawa E, Karabatsos G, Amatya AK, Bhaumik DK, Brown CH, Kapur K, Marcus SM, Hur K, Mann JJ.
    Stat Med. 2008 May 20;27(11):1814-1833.
  10. Barry CL and Busch SH.
    Pediatrics. 2010 125[1]:88-95.
  11. Gibbons RD, Mann JJ.
    Drug Saf. 2011 May 1;34(5):375-395.
  12. Susan Busch, Ezra Golberstein, Ellen Meara
    NATIONAL BUREAU OF ECONOMIC RESEARCH, September 2011.
  13. Robert D. Gibbons, Hendricks Brown, Kwan Hur, John M. Davis, and J. John Mann
    Arch Gen Psychiatry. 2012 Jun;69(6):580-587.
  14. Gibbons RD, Coca Perraillon M, Hur K, Conti RM, Valuck RJ, and Brent DA
    Pharmacoepidemiologic Drug Safety. 2014 Sep 29. doi: 10.1002/pds.3713. [Epub ahead of print]
  15. Christine Y Lu, Fang Zhang , Matthew D Lakoma analyst, Jeanne M Madden, Donna Rusinak, Robert B Penfold, Gregory Simon, Brian K Ahmedani, Gregory Clarke, Enid M Hunkeler, Beth Waitzfelder, Ashli Owen-Smith, Marsha A Raebel, Rebecca Rossom, Karen J Coleman, Laurel A Copeland, Stephen B Soumerai
    BMJ. 2014 348:g3596.
  16. PSYCHIATRICNEWS
    by Mark Moran
    12/30/2014
  17. by Richard A. Friedman, M.D.
    New England Journal of Medicine 2014 371:1666-1668.
  18. by Marc B. Stone, M.D.
    New England Journal of Medicine 2014 371:1668-1671.
  19. New York Times
    by Richard A. Friedman
    AUGUST 3, 2015
While I’ll readily admit that  the findings in this Swedish study fit my own ideas about this topic, I’m impressed that they did a good job in bringing an innovative design using objective measures to bear on the problem. I think it’s a study well worth looking into in more depth. Most of the articles in the list above start with a conclusion and then try to validate it [in my humble opinion]…
Mickey @ 8:00 AM

study 329 x – “it wasn’t sin – it was spin”…

Posted on Thursday 17 September 2015

[Note: the Press coverage of our article is on study329.org, but I wanted to mention the article on Retraction Watch because it has Dr. Martin Keller’s response to our paper with an argument similar to the one below…]

We know from this internal memo and position piece that the initial SKB interpretation of the efficacy results from Study 329 mirrored those reported in our RIAT article:
14 OCT 1998

Please find attached to this memo a position piece, prepared by Julie Wilson of CMAT, summarising the results of the clinical studies in Adolescent Depression.

As you will know, the results of the studies were disappointing in that we did not reach statistical significance on the primary end points and thus the data do not support a label claim for the treatment of Adolescent Depression. The possibility of obtaining a safety statement from this data was considered but rejected. The best which could have been achieved was a statement that, although safety data, was reassuring, efficacy had not been demonstrated. Consultation of the Marketing Teams via Regulatory confirmed that this would be unacceptable commercially and the decision to take no regulatory action was recently endorsed by the TAT.

As you will see from the position piece the positive trends In efficacy which were seen in Study 329 are being published as a poster at ECNP this year and a full manuscript is in development. Published references will therefore be available for the study. There are no plans to publish data from Study 377.

This report has been prepared for internal use only. Data on File summaries will be prepared and issued once the final reports from the studies have been approved. This position piece will also be available on the Seroxat/Paxil resource database.

TARGET [from the Wilson position piece mentioned above]
To effectively manage the dissemination of these data in order to minimize any potential negative commercial impact
This was, indeed, a negative study, though the published article reached the opposite conclusion [2001]:
Paroxetine is generally well tolerated and effective for major depression in adolescents.
Three  years ago, when I reviewed the exchange between Healthy Skepticism and the editor of the publishing Journal of the Academy of Child and Adolescent Psychiatry [see the lesson of Study 329: naked Emperors, fractious Queens…], I left out parts of the author’s response to the letter from Jureidini and Tonkin [2003]. This is where they attempt to explain "why" they felt justified in using the non-protocol outcomes:
This study was designed at a time when there were no randomized controlled trials showing antidepressant [tricyclic antidepressant or SSRI] superiority to placebo, so we had no prior data from which to astutely pick our outcome measures. The field has moved strongly away from using the Hamilton Rating Scale for Depression [HAM-D] in adolescent treatment studies and has gone virtually uniformly to using the Children’s Depression Rating Scale-Revised because the latter better and more reliably captures aspects of depression in youth. Surely a national regulatory body charged with approving or not approving a medication for a particular use might well simply say that if a study does not show efficacy on the primary endpoint[s[, it is a failed study and secondary outcome measures cannot then be used for approval. However, as scientists and clinicians we must adjudge whether or not the study overall found evidence of efficacy, and we do not have the convenience of falling back on such a simple rule. If we choose wrongly [in whichever direction], we don’t treat depressed children as well as the data would permit. Because we found a clear pattern of significant p values across multiple secondary analyses [recovery as assessed by HAM-D < 8, HAM-D depressed mood item, the Schedule for Affective Disorders and Schizophrenia for School-Age Children depression item, and Clinical Global Impression score at endpoint], we thought and still think this provides significant evidence of efficacy of paroxetine compared with placebo in adolescent depression. Without established reliable measures that distinguish medication responders from nonresponders at the time the study was designed, it is not surprising that the primary measures did not reach significance while other measures did. It still provides a strong “signal” for efficacy…
Creative! I expect that the comments about the CDRS-R [Children’s Depression Rating Scale-Revised] are in the vicinity of reasonable. One wonders why they didn’t say this in the first place in either the article or the Clinical Study Report. But if you take a look at several previous posts [paxil in adolescents: “five easy pieces”…, an addendum…, and follow-up…], you’ll see a definitive counter to this creative, latter day response [also apparent in this timeline]:
At the time the 329 authors wrote their response to Jon Jurieidini and Ann Tonkin in May 2003, SKB [GSK] had already completed two other Clinical Trials of Paxil in adolescents – one of them actually using the CDRS-R as a primary outcome variable. Those two studies were eventually published [after the patent for Paxil expired], but they were conducted much earlier and SKB [GSK] had the results [top figure]. When they used the CDRS, Placebo actually beat Paxil [bottom figure in yellow]. So at the time of that authors’ response letter, they justified what they’d said in Study 329 with an argument they’d already tested and already knew was a dead end [Study 701]:

using MADRS:
by Ray Berard, Regan Fong, David J. Carpenter, Christine Thomason, and Christel Wilkinson
Journal of Child and Adolescent Psychopharmacology. 2006 16[1-2]:59–75.
Conclusions: No statistically significant differences were observed for paroxetine compared with placebo on the two prospectively defined primary efficacy variables. Paroxetine at 20–40 mg/day administered over a period of up to 12 weeks was generally well tolerated.

using CDRS-R:
by GRAHAM J. EMSLIE, KAREN DINEEN WAGNER, STAN KUTCHER, STAN KRULEWICZ, REGAN FONG, DAVID J. CARPENTER, ALAN LIPSCHITZ, ANDREA MACHIN, AND CHRISTEL WILKINSON
Journal of the American Academy of Child and Adolescent Psychiatry. 2006 45[6]:709-719.
Conclusions: Paroxetine was not shown to be more efficacious than placebo for treating pediatric major depressive disorder.
It may seem an odd way to end this particular run-on series of blog posts using a paragraph from a letter now over a decade old. But in study 329 vi: revisited…, I said, "the erroneous conclusion in Keller et al can hardly be chalked up to a mistake. It shows too many tell-tale signs of intention." That’s an opinion, my strong opinion, and I wanted to back it up with an example that didn’t just come from our reanalysis. In the very first real challenge to the article back in their 2003 letter to the JAACAP, Jon Jureidini and Ann Tonkin of Healthy Skepticism clearly saw what it has taken fourteen years of dogged persistence to finally insert into the literature in the form of our RIAT article [see the lesson of Study 329: naked Emperors, fractious Queens…]:
The article by Keller et al. [2001] is one of only two to date to show a positive response to selective serotonin reuptake inhibitors [SSRIs] in child or adolescent depression. We believe that the Keller et al. study shows evidence of distorted and unbalanced reporting that seems to have evaded the scrutiny of your editorial process. The study authors designated two primary outcome measures: change from baseline in the Hamilton Rating Scale for Depression [HAM-D] and response [set as fall in HAM-D below 8 or by 50%]. On neither of these measures did paroxetine differ significantly from placebo. Table 2 of the Keller article demonstrates that all three groups had similar changes in HAM-D total score and that the clinical significance of any differences between them would be questionable. Nowhere is this acknowledged. Instead:
  1. The definition of response is changed. As defined in the “Method” section, it has a nonsignificant p value of .11. In the “Results” section [without any explanation], the criterion for response is changed to reduction of HAM-D to below 8 [with a p value of .02]. By altering the criterion for the categorical measure of outcome, the authors are able to claim significance on a primary outcome measure.
  2. In reporting efficacy results, only “response” is indicated as a primary outcome measure, and it could be misunderstood that response was the primary outcome measure. Only in the discussion is it revealed that “Paroxetine did not separate statistically from placebo for…HAM-D total score,” without any acknowledgment that total score was one of the two primary outcome measures. The next sentence is a claim to have demonstrated efficacy for paroxetine.
Thus a study that did not show significant improvement on either of two primary outcome measures is reported as demonstrating efficacy. Given that the research was paid for by Glaxo-Smith-Klein, the makers of paroxetine, it is tempting to explain the mode of reporting as an attempt to show the drug in the most favorable light. Given the frequency with which it is cited in other scientific papers, at conferences and educational functions, and in advertising, this article may have contributed to the increased prescribing of SSRI medication to children and adolescents. We believe it is a matter of importance to public health that you acknowledge the failings of this article, so that its findings can be more realistically appraised in decision-making about the use of SSRIs in children.
With a careful reading, they saw through to the essence of what was wrong without the benefit of any of the back story, the raw data, or the numerous analyses that have followed over the years about this study. It’s a great example for all of us to emulate. Being a doctor is hard work by any standard, and we feel good about putting in all the extra time it takes to stay current. I doubt there’s any profession that can claim the "life-long-learning" moniker any more than we can. You never really graduate from medical school and there’s a never ending series of tests [AKA patients] as long as you’re in the game. So we get used to scanning, reading non-critically, in part because of the volume. But every one of us needs to learn how to recognize the signs that a given article needs to be read like Jon and Ann read this one. The modern industry sponsored Clinical Trial literature in all of medicine is filled with articles that need a long second look. Without thinking, I coined a phrase answering a reporter’s questions about our paper, "it wasn’t sin – it was spin." In the political arena, they call it plausible deniability. I don’t really believe it wasn’t sin [it may be the biggest sin of all because it’s the kind people get away with]. But the phrase still conveys a useful diagnostic take-home message to remind us what we’re on the lookout for…
Mickey @ 8:00 PM

study 329 ix – mystic statistics…

Posted on Thursday 17 September 2015

Most of us have an incomplete knowledge of Statistical Analysis unless we’ve had formal training and hands-on experience, yet we tend to accept the output from the computer’s statistical packages as if it’s dogma. In academic and commercial laboratories, we count on Statisticians [or trained SAS Programmers] to generate those abstract lettered indices that we discuss as if they’re absolutes – p, d, k, SEM, SD, NNT, OR, etc. And even the experts can’t check things with a pad and pencil. So we’re vulnerable to subtle [and even not so subtle] deceptions. In our RIAT Team’s reanalysis of Study 329, we had decided to follow the a priori protocol, which meant sticking to the protocol defined outcome variables and ignoring those later exploratory variables [in blue in Keller et al‘s Table 2] as discussed earlier.

The Study 329 protocol is clear and precise about statistical testing: parametric Analysis of Variance for the continuous variables and Logistical Regression for the categorical [yes/no] variables. They specified a model containing treatment and investigator with contingencies for interactions between them [since I’ve already put the non-stat-savvy set to sleep, I’m going to dumb this down a bit going forward]. We noticed that our p values differed from those in both the Keller et al paper and the CSR [Full study report acute], even though our open source statistical package [R] is equivalent to their commercial package [SAS] – both available in the Secure Data Portal provided by GSK. While the results for the protocol defined variables were not significant, the numbers still should’ve been close to the same. And there was something else. They were reporting statistics for Paroxetine vs Placebo, Imipramine vs Placebo, and saying that the study was not powered to test Paroxetine vs Imipramine – all pairwise comparisons. Why this was important takes a little explaining.

When a dataset has only two groups [as a study of Paroxetine vs Placebo], pairwise statistical comparisons with something like the familiar t-test are perfectly appropriate. But when you run statistical comparisons on datasets with more than two groups, there’s a two step process. First you test the whole dataset using an OMNIBUS statistical test like Analysis of Variance [ANOVA]. If the whole dataset is significant, then you can run pairwise tests between the various groups to find where the significance lies. But if the OMNIBUS test is not  significant, it means that there are no differences among the groups – and that’s the end of that. The pairwise tests are immaterial no matter how they come out. Keller et al had skipped the OMNIBUS tests altogether [never mentioned in the protocol, the paper, or the CSR]. Our results were the OMNIBUS statistics and that’s why they were different. With the protocol-defined variables under consideration, it didn’t matter since nothing was significant no matter what your method. So the question became, "Why bother to skip the OMNIBUS statistical tests?"

Since we decided to drop those non-protocol-variables because they were declared post hoc [see the last two posts], we had never run the full statistical model analysis on them. But I remembered a spreadsheet we did on a rough pass through this data when we were first getting started. The results are shown here [the OMNIBUS tests are in the far right column and all significant values are shown in red]:

The protocol-specified-variables [white background] are not significant as reported by Keller et al. But look at the non-protocol variables [gray background]. Only two were OMNIBUS-significant. And look at the columns measuring strength of effect [EFFECT SIZE, NNT, ODDS RATIO]. Except for the HAM-D DEPRESSED ITEM, those exploratory variables are pretty lame [weak]. While this was a crude first-take without considering the investigator covariate, it suggests that the OMNIBUS statistics didn’t help their cause so they were conveniently ignored. That could offer a plausible explanation for why they skipped the OMNIBUS statistical test altogether [in fact, it’s the only explanation I can think of]. Recalling that spreadsheet, I went back and ran the "full monty" model on these variables and three of them did make it under the p<0.05 wire after all: as expected, the HAM-D DEPRESSED ITEM yielded p=0.0032; but the others only barely made the cut, HAM-D REMISSION was p=0.0504, and CGI IMPROVEMENT came in at p=0.0493. Those last two were barely statistically significant, hardly seeming clinically relevant. And there was something else [see below]. The LOCF dataset for K-SADS-L was very difficult to judge since it was an every other week metric and a number of subjects got off schedule, but for what it’s worth, I could never find the reported significance with various shots at defining the LOCF dataset. Running the full model, I got p=0.0833 OMNIBUS and p=0.0662 for Paroxetine vs Placebo.

Just one more piece of techno-babble. There’s something more to say about those two minimally significant exploratory variables:

Both of the non-protocol categorical variables were only significant in week 8, suggesting to me that they were probably outliers [flukes]. And, as mentioned earlier, even if you include the rogue non-protocol exploratory variables, applying any correction for multiple variables would wipe out statistical significance for three of the four. That leaves the HAM-D DEPRESSED ITEM as the only statistically significant finding in this entire study – one question on a multi-item rating scale! So in order for Keller et al to reach the conclusion "Paroxetine is generally well tolerated and effective for major depression in adolescents," all three things had a part to play: no correction for multiple variables; redefining a priori to mean before the blind is broken rather than before the study begins; and ignoring the OMNIBUS statistical test.

I know these posts are TMI [too much information], so this is the end of all my number chatter. To my way of thinking, Study 329 has become a paradigm, emblematic of the widespread subtle distortion of the tools of scientific analysis in the service of commercial gains in the analysis of Clinical Trials. We wrote this RIAT paper to correct the existing scientific literature, but also to give the clear message that if you publish Clinical Trials that disseminate misinformation to physicians and patients, they might just be coming right back at you. And, in the future, with greater Data Transparency and awareness, it won’t take any fourteen years to make the circuit…
Mickey @ 4:00 PM

study 329 viii – variable variables decoded…

Posted on Thursday 17 September 2015

WARNING AGAIN: Sometimes the devil really is in the details. Here too, we need to get pretty far into the details…

The 528 page Full study report acute is an exhaustive narrative of the entire study including the results in much more detail than the published article – filled with charts and tables. It was released in part by court order in 2004 and in full in 2012. The sections that relate to our questions about the changing outcome variables in the published paper are scattered around. The first reference is on page 44:
Prior to opening the blind, the sponsor and investigators developed a plan to analyze the efficacy data. The plan described a definition of responders and called for additional measures of effectiveness. These included the depression items from the HAM-D and K-SADS-L instruments, and the plan provided for a status of remission…
The reason for the discrepancy mentioned in the last post is simple. They just changed the outcome variables, redefining the meaning of a priori in the process. It’s intended to mean before starting the study. They’ve changed it to mean before the blind is broken [which comes at the end of the study]. They continue on page 49:
Primary Efficacy Variables
The protocol defined the primary efficacy parameters for comparing the efficacy of each active treatment with that of placebo to be:
  • The change from baseline in total HAM-D score at endpoint of the acute phase.
  • The percentage of responders at the endpoint of the acute phase.
Initially the protocol defined a "responder" as a patient whose HAM-D at endpoint was at least 50% lower than the baseline score. This definition was to be used as an "operational" criteria for entry into the continuation phase.

Prior to opening the blind, the sponsor and the investigators developed an analytical plan. Among other issues, this agreed plan included a definition of a "responder" and a "remission" status. The intent was to provide a robust definition of "response" and to describe a status of "remission" in order to provide a rigorous anchor point in analyzing relapses in the continuation phase.

The agreed analytical plan described a "responder" as a patient whose HAM-D score was 8 or less or was reduced from baseline by at least 50%. The remission status was defined as a HAM-D score of 8 or less. The agreed analytical plan also called for the following measure of effectiveness to be included in the analysis: the 9-item depression subscale of the K-SADS-L, the depression item from both the HAM-D and the K-SADS-L, and two methodologies for analyzing the clinical global improvement score: 1) the mean scores and 2) the proportion of patients with rating of "1" or "2" ("very much" or "much improved" respectively). The initial protocol described the K-SADS-L and CGI instruments as secondary measures.

The protocol defined as secondary measures the behavior and functional instruments. These included the Autonomic Function Checklist (AFC), the Self- Perception Profile (SPP), and the Sickness Impact Scale (SIP). The agreed analytical plan included a time to sustained response and various subsidiary covariate analyses of response as secondary analyses.

If you look back at the last post, the name changing, depression-related variables and declared a priori  make more sense. They simply cover-up the fact that they’ve changed the outcome variables by changing the meaning of a priori [without really letting you in on the change]. The terms depression-related variables and plan of analysis appear to have come from an email exchange among the ghost writer and the two SKB authors during editing. And speaking of changes, remember that this is part of Appendix B of that self-same Protocol:

II.  PROTOCOL AMENDMENTS
  No changes to the study protocol will be allowed unless discussed in detail with the SmithKline Beecham (SB) Medical  Monitor and filed as an amendment/modification to this protocol. Any amendment/modification to the protocol will be adhered to by the participating centre (or all participating centres) and will apply to all subjects following approval as appropriate by the Ethical Review Committee or Institutional Review Board.

You might need to take a look at the table again. I’m going to resist the temptation to go on and on about how deceitful that was, since Study 329 will likely become the enduring symbol for why we strictly adhere to the before starting the study meaning of a priori in Clinical Trials going forward. But since we were looking to reanalyze this study, we felt obligated to look for documentation that this plan of analysis actually occurred as described in the CSR. The volumes of archived documents available from the various lawsuits yielded only the Draft Minutes of an April 22, 1997 Teleconference call a few months before the blind was broken. After repeated requests to GSK, we received another – a copy of an email documenting that an SAS programmer was coding for some of the suggested new analysis, again from the period right before the blind was broken. From the perspective of our reanalysis of the data from Study 329, our path was clear. The point was to aim towards adding a correct analysis of the study to the medical literature, so we planned to use the outcome variables outlined by the original a priori protocol [from before starting the study] and to only mention the late addition non-protocol variables in passing without analysis.

Parenthetically, in the three years since I first happened onto the data from Study 329, there has been an intense debate about Data Transparency – the AllTrials Initiative, the EMA, the FDA/NIH, Journals [BMJ], even some PHARMAs, etc. But there’s anything but a consensus about what Data Transparency actually means. The CSR [Clinical Study Report] is often mentioned as the thing to release into the Public Domain for Data Transparency. Every time I read that, I cringe [see post-it notes…]. These CSRs seem to vary a lot from trial to trial, but many, like this one, don’t have the raw data included. In this case, that data was listed as Appendices, but they weren’t released along with the narrative in 2004. And nobody much noticed for 8 years!  The CSR, like the published paper, is only an authored proxy for the data itself. In fact, in the case of Study 329, the CSR is as flawed as the paper itself, if not even more so. Data Opacity reigned.

In this case, we are asked to believe that the authors, uninfluenced by any foreknowledge of the results, decided to completely revamp the outcome analysis of this study in its 11th hour, and just happened to pick new variables that just happened to turn out to be the only ones achieving statistical significance. I don’t personally believe that for a second, but what I believe is hardly the point. Belief has no place in such a high stakes enterprise. There are only two things that guarantee the integrity of a Clinical Trial’s analysis – the a priori protocol and the blinding itself. Since there’s no real way to be sure about the blind, the a priori protocol [from before starting the study] is and should be sacrosanct.

We don’t even have to ask why one might want to change variables along the way – the new ones were significant and the old ones weren’t. But as we’ll soon see, that wasn’t the only change that occurred along the way
Mickey @ 12:00 PM

study 329 vii – variable variables?…

Posted on Thursday 17 September 2015

WARNING: Sometimes the devil is in the details. Here, we need to get pretty far into the details…

RATING SCALES


HAM-D     Hamilton Depression Rating Scale
K-SADS-L Kiddie Schedule for Affective Disorders – Lifetime
CGI Clinical Global Impressions Scale
AFC Autonomous Function Checklist
SPP Self-Perception Profile
SIP Sickness Impact Profile

It is essential in Clinical Trials to declare the specific outcome variables and the statistical analytic methodology in the a priori protocol [before the study begins]. That’s the only way to assure that the methodology hasn’t been adjusted, jury-rigged to make the data look a certain way. The standard is to not even allow for the possibility of mistrust. And any changes to the a priori protocol need to be added in as an amendment or modification, approved by the certifying agency ie the Institutional Review Board [IRB]. In a recent study of Clinical Trials in Psychiatry [Is Mandatory Prospective Trial Registration Working to Prevent Publication of Unregistered Trials and Selective Outcome Reporting?], only one third of the studies from the five journals with the highest impact rating in Psychiatry abided by this rule, and in the end, only 14% of the total cohort carried out the analysis precisely as described in the a priori protocol – even though it was a requirement for publication in all five journals.

The protocol for Paxil Study 329 was written well before the actual Clinical Trial began and was quite clear in reference to the outcome variables. They are unambiguously declared in the a priori protocol:
Primary Efficacy Variables
• Change in total HAMD score from beginning of treatment phase to the endpoint of the acute phase.
• The proportion of responders at the end of the eight week acute treatment phase. "Responders are defined as 50% or greater reduction in the HAM-D or a HAM-D score equal to or less than 8.
Secondary Efficacy Variables
• Change from baseline to endpoint (acute phase) in the depression items of the K-SADS-L, global impressions, autonomic function checklist, self perception profile and sickness impact scale.
• The number of patients who relapse during the maintenance phase.
Likewise, the requirements for changing the protocol are explicit:
PROTOCOL AMENDMENTS
No changes to the study protocol will be allowed unless discussed in detail with the SmithKline Beecham (SB) Medical  Monitor and filed as an amendment/modification to this protocol. Any amendment/modification to the protocol will be adhered to by the participating centre (or all participating centres) and will apply to all subjects following approval as appropriate by the Ethical Review Committee or Institutional Review Board.
Yet in the published 2001 paper, we find that some of the outcome variables have been changed [in the Efficacy and Safety Evaluation Section]:
"The protocol described two primary outcome measures: (1) response, which was defined as a HAM-D score of <8 or a >50% reduction in baseline HAM-D score at the end of treatment; and (2) change from baseline in HAM-D total score. Five other depression-related variables were declared a priori: (1) change in the depressed mood item of the HAM-D; (2) change in the depression item of the K-SADS-L; (3) Clinical Global Impression (CGI) improvement scores  of 1 (very much improved) or 2 (much improved); (4) change in the nine-item depression subscale of the K-SADS-L; and (5) mean CGI improvement scores."
Notice that the two primary outcome measures are identified as protocol described, whereas the remaining five are not labeled as coming from the protocol, nor are they called secondary outcome variables. Instead, they’re identified as depression-related variables and declared a priori. Then later in the article, there’s yet a different version in the Efficacy Results section text and Table 2 [page 766], adding in response:
Of the depression-related variables, paroxetine separated statistically from placebo at endpoint among four of the parameters: response (i.e., primary outcome measure), HAM-D depressed mood item, K-SADS-L depressed mood item, and CGI score of 1 (very much improved) or 2 (much improved) and trended toward statistical significance on two measures (K-SADS-L nine-item depression subscore and mean CGI score).

non-protocol variables in blue

I’ve colored the non-protocol variables blue. Those differences are the very devil in the details that are unexplained in the article itself. In fact, without having the protocol in hand, one would not likely notice them [and very few did  notice them]. Here’s a summary of the various versions:

This discrepancy is what caught the attention of a few early nay-sayers about this article. All four of the significant variables in Table 2 are nowhere mentioned in the protocol, so the designation, a priori, doesn’t really make a bit of sense. For that matter, why use the term depression-related rather than Secondary? And where did response [HAM-D <8] come from? In order to explain these changes, you have to have the 500+ page CSR [Clinical Study Report] available and a keen eye. And recall that the CSR only became available as part of the 2004 settlement between GSK and the State of New York.

So far, only questions, but some answers are just around the corner…
Mickey @ 8:00 AM

study 329 vi – revisited…

Posted on Wednesday 16 September 2015

Well, our RIAT article, Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence, is finally published online at the British Medical Journal. It’s fairly straightforward. The emphasis is on the harms analysis for obvious reasons – an accurate representation of a drug’s safety is always the first order of business. Here’s the abstract and links for the 2001 original publication:
by MARTIN B. KELLER, M.D., NEAL D. RYAN, M.D., MICHAEL STROBER, PH.D., RACHEL G. KLEIN, PH.D., STAN P. KUTCHER, M.D., BORIS BIRMAHER, M.D., OWEN R. HAGINO, M.D., HAROLD KOPLEWICZ, M.D., GABRIELLE A. CARLSON, M.D., GREGORY N. CLARKE, PH.D., GRAHAM J. EMSLIE, M.D., DAVID FEINBERG, M.D., BARBARA GELLER, M.D., VIVEK KUSUMAKAR, M.D., GEORGE PAPATHEODOROU, M.D., WILLIAM H. SACK, M.D., MICHAEL SWEENEY, PH.D., KAREN DINEEN WAGNER, M.D., PH.D., ELIZABETH B. WELLER, M.D., NANCY C. WINTERS, M.D., ROSEMARY OAKES, M.S., AND JAMES P. MCCAFFERTY, B.S.
Journal of the American Academy of Child and Adolescent Psychiatry, 2001, 40[7]:762–772.

Objective: To compare paroxetine with placebo and imipramine with placebo for the treatment of adolescent depression.
Method: After a 7 to 14-day screening period, 275 adolescents with major depression began 8 weeks of double-blind paroxetine [20–40 mg], imipramine [gradual upward titration to 200–300 mg], or placebo. The two primary outcome measures were endpoint response [Hamilton Rating Scale for Depression [HAM-D] score <8 or >50% reduction in baseline HAM-D] and change from baseline HAM-D score. Other depression-related variables were [1] HAM-D depressed mood item; [2] depression item of the Schedule for Affective Disorders and Schizophrenia for Adolescents-Lifetime version [K-SADS-L]; [3] Clinical Global Impression [CGI] improvement scores of 1 or 2; [4] nine-item depression subscale of K-SADS-L; and [5] mean CGI improvement scores.
Results: Paroxetine demonstrated significantly greater improvement compared with placebo in HAM-D total score <8, HAM-D depressed mood item, K-SADS-L depressed mood item, and CGI score of 1 or 2. The response to imipramine was not significantly different from placebo for any measure. Neither paroxetine nor imipramine differed significantly from placebo on parent- or self-rating measures. Withdrawal rates for adverse effects were 9.7% and 6.9% for paroxetine and placebo, respectively. Of 31.5% of subjects stopping imipramine therapy because of adverse effects, nearly one third did so because of adverse cardiovascular effects.
Conclusions: Paroxetine is generally well tolerated and effective for major depression in adolescents.


[upper figure from original] [lower figure from the next paper]

And here’s our RIAT republication:
by Le Noury J, Nardo J, Healy D, Jureidini J, Raven M, Tufanaru C, and Abi-Jaoude E.
British Medical Journal. 2015 …

Objectives: This is a reanalysis of SmithKline Beecham’s Study 329 [published by Keller et al. in 2001], the primary objective of which was to compare the efficacy and safety of paroxetine and imipramine to placebo in the treatment of adolescents with unipolar major depression. The objective of this restoration under the Restoring Invisible and Abandoned Trials [RIAT] initiative was to see whether access to and reanalysis of a full dataset from a randomised controlled trial would have clinically relevant implications for evidence based medicine.
Design: Double-blind randomised placebo-controlled trial. Setting: 12 North American academic psychiatry centres, from 20 April 1994 to 15 February 1998.
Participants: 275 adolescents with major depression of at least 8 weeks in duration. Exclusion criteria included a range of comorbid psychiatric and medical disorders and suicidality. Interventions: Participants were randomised to 8 weeks double-blind treatment with paroxetine [20-40 mg], imipramine [200-300 mg], or placebo.
Main outcome measures: The pre-specified primary efficacy variables were: change from baseline to the end of the 8-week acute treatment phase in total Hamilton Depression Scale [HAM-D] score; and the proportion of responders [HAM-D score <8 or >50% reduction in baseline HAM-D] at acute endpoint. Pre-specified secondary outcomes were [1] changes from baseline to endpoint in the following parameters: depression items in K-SADS-L; Clinical Global Impression; Autonomous Functioning Checklist; Self-Perception Profile; Sickness Impact Scale, [2] predictors of response, [3] number of patients who relapse during the maintenance phase. Adverse experiences were to be compared primarily by using descriptive statistics. No coding dictionary was pre-specified.
Results: The efficacy of paroxetine and imipramine was not statistically or clinically significantly different from placebo for any pre-specified primary or secondary efficacy outcome. HAM-D scores decreased by 10.73 [9.134 to 12.328], 8.95 [7.356, to 10.541] and 9.08 [7.450 to 10.708] points, least-squares mean [95%Confidence Interval], respectively, for the paroxetine, imipramine and placebo groups [p = 0.204]. Clinically significant increases in harms were observed, including suicidal ideation and behaviour and other serious adverse events in the paroxetine group and cardiovascular problems in the imipramine group.
Conclusions: Neither paroxetine nor high-dose imipramine demonstrated efficacy for major depression in adolescents, and there was an increase in harms with both drugs. Access to primary data from trials has important implications for both clinical practice and research, including that published conclusions about efficacy and safety should not be read as authoritative. The reanalysis of Study 329 illustrates the necessity of making primary trial data available to increase the rigour of the evidence base.

Keller et al concluded that Paroxetine was effective in adolescent depression in this study, and we found that it wasn’t. The difference was not in the data. The data was the same. And while there were some differences in the analytic approach, that doesn’t explain the different conclusion either. The difference was in what the two groups considered to be the outcome measures. We followed SKB’s a priori protocol definition of primary and secondary variables, whereas they added four new outcomes. Without those add-on outcomes, there were no significant findings. From our paper [page 4]:
Both before and after breaking the blind, however, the sponsors made changes to the secondary outcomes… We could not find any document that provided any scientific rationale for these post hoc changes, and the outcomes are therefore not reported in this paper…

Outcome variables not specified in protocol:
There were four outcome variables in the CSR and in the published paper that were not specified in the protocol. These were the only outcome measures reported as significant. They were not included in any version of the protocol as amendments (despite other amendments), nor were they submitted to the institutional review board. The CSR (section 3.9.1) states they were part of an “analysis plan” developed some two months before the blinding was broken. No such plan appears in the CSR, and we have no contemporaneous documentation of that claim, despite having repeatedly requested it from GSK.
In the paper, they didn’t explain adding these new outcomes. It just happened. In the CSR [Clinical Study Report] made public in 2004, they described the process of this change ["analysis plan"], but the "why" didn’t come until several years later in a response to a challenge. In the course of our re-analysis, I think we ended up with a fairly clear picture of that story. There were several things along the path that were involved. The simplest was an omission in the original a priori protocol. They neglected any discussion of correcting for multiple variables. With as many variables as were accessed in this study, that was a very significant omission. Had they applied absolutely any of the available correction schemes needed as a check on false positives, even with the other changes, they could’ve never claimed a positive efficacy outcome.

There were several other subtle maneuvers that shaped their outcome claims. I’ll leave them for later. Right now, it’s enough to ask you to look over our paper itself. But I will say that the erroneous conclusion in Keller et al can hardly be chalked up to a mistake. It shows too many tell-tale signs of intention. But in spite of all the documents amassed about this study, there’s no way for anyone to really know all the forces at work fourteen years ago when this article was written, or what the process among the various players and authors actually was. We do know that the CSR [Clinical Study Report] used by ghost-writer Sally Laden already contained these subtleties, so I presume that they happened before the summary ever got to her. That’s about as close as I can get. The next several posts is what I can piece together about this story…

Mickey @ 6:30 PM

emil kraepelin – born again [again]…

Posted on Tuesday 15 September 2015

Freud, Bleuler, Jaspers, Meyer, Kraepelin

I’ve always been almost as interested in the history of the scientists as the science they pursued. And one of the pleasures of psychiatry for me was that this was an important perspective, specialty-wide. That was even more true in psychoanalysis where history is really central. So the life and times of Karl Jaspers, Emil Kraepelin, Sigmund Freud, Kurt Schneider, Adolf Meyer, Carl Jung, Harry Stack Sullivan, etc. was intermingled with the ideas they championed. In 1980, with the coming of the DSM-III and the bio-medical model, it felt like our populated past disappeared and there was only Emil Kraepelin. But worse than that, it wasn’t the Emil Kraepelin I had gotten to know. It was as if I had developed a case of the Capgras Syndrome.

The new new Kraepelin [the American neo-Kraepelinian Emil Kraepelin] felt like a cardboard cutout character from a morality play. He had a monocular biological view of mental illness – end of story. The one I used to know collaborated with Alois Alzheimer, distinguished Affective Psychosis from Thought Disorder [Dementia Praecox], looked at the clinical course of the illnesses, wrote extensive case notes, constantly revised his own work, obsessively wrote down his dreams in an attempt to understand Schizophrenic thought, was a moral zealot opposed to Alcohol and the carnal life [Syphillis], became a German Nationalist who sounded like a proto-nazi eugenics anti-semite at times in his later life, etc. etc. In other words, he was a complicated multidimensional character, among the many other such complex [and often brilliant] real people who studied mental illness and struggled with René Descartes’ mind-body dualism. He was a part of a great big story, not the story itself.


Emil Kraepelin  Alois Alzheimer  Auguste Deter

So it’s good to see the possibility that my old teacher, Emil Kraepelin, might be born again [again]. I’ve missed him…
by Eric J. Engstrom and Kenneth S. Kendler
American Journal of Psychiatry. Published online: September 11, 2015.

In the last third of the 20th century, the German psychiatrist Emil Kraepelin [1856–1926] became an icon of postpsychoanalytic medical-model psychiatry in the United States. His name became synonymous with a proto-biological, antipsychological, brain-based, and hard-nosed nosologic approach to psychiatry. This article argues that this contemporary image of Kraepelin fails to appreciate the historical contexts in which he worked and misrepresents his own understanding of his clinical practice and research. A careful rereading and contextualization of his inaugural lecture on becoming chair of psychiatry at the University of Tartu [known at the time as the University of Dorpat] in 1886 and of the numerous editions of his famous textbook reveals that Kraepelin was, compared with our current view of him, [1] far more psychologically inclined and stimulated by the exciting early developments of scientific psychology, [2] considerably less brain-centric, and [3] nosologically more skeptical and less doctrinaire. Instead of a quest for a single “true” diagnostic system, his nosological agenda was expressly pragmatic and tentative: he sought to sharpen boundaries for didactic reasons and to develop diagnoses that served critical clinical needs, such as the prediction of illness course. The historical Kraepelin, who struggled with how to interrelate brain and mind-based approaches to psychiatric illness, and who appreciated the strengths and limitations of his clinically based nosology, still has quite a bit to teach modern psychiatry and can be a more generative forefather than the icon created by the neo-Kraepelinians.
Mickey @ 10:44 PM

ambivalence…

Posted on Tuesday 15 September 2015


Dear Colleagues,

After serving 13 years as Director for the National Institute of Mental Health [NIMH], Thomas R. Insel, M.D., will step down effective November 1, 2015…

While we conduct a national search for a new NIMH Director, Bruce Cuthbert, Ph.D., will serve as Acting Director…

Francis S. Collins, M.D., Ph.D.
Director, National Institutes of Health
Can you feel relief and worried at the same time?

New York Times
By BENEDICT CAREY
SEPT. 15, 2015

… Dr. Insel, a brain scientist who made his name studying the biology of attraction and pair bonding, was the longest-serving director since Dr. Robert H. Felix, the agency’s founder, stepped down in 1964. Appointed in 2002, his tenure spanned four presidential terms, during which he honed an easygoing political persona and an independent vision of the agency’s direction. He steered funding toward the most severe mental disorders, like schizophrenia, and into basic biological studies at the expense of psychosocial research, like new talk therapies.

He was outspoken in defense of this path, at one point publicly criticizing establishment psychiatry for its system of diagnosis, which relies on observing behaviors instead of any biological markers. His critics – and there were plenty – often noted that biological psychiatry had contributed nothing useful yet to diagnosis or treatment, and that Dr. Insel’s commitment to basic science was a costly bet, with uncertain payoff…

… In his statement, Dr. Insel said the final details of his move to Google were not firm. The team is developing advanced technologies for better detection and prevention of illness, he wrote, and “I am joining the team to explore how this mission can be applied to mental illness”…
One can look at it like Benedict Carey does in this piece. He’s a reporter with a keen eye for such things. And what he says is certainly accurate, "He steered funding toward the most severe mental disorders, like schizophrenia, and into basic biological studies at the expense of psychosocial research, like new talk therapies" and was certainly a big problem. But that’s not what bothered me so much about Dr. Insel’s reign at NIMH. It’s the word, "steered." The way I’ve thought of it in my mind, he misunderstood the meaning of his title – Director. It’s supposed to mean that he directs an Institute and its infostructure in a way that locates the best and brightest scientists we have and provides the support they need to do those things that the best and brightest do – bring the scientific apparatus to bear on the problems they have insights into. The scientists generate the projects; the NIMH evaluates the relevance and feasability of those ideas; and supports the best and brightest of the lot. Dr. Insel interpreted the word director as meaning he directed what those projects were going to be, and the scientists followed his directions [if they wanted to be funded].

Besides being too controlling, Insel is a "breakthrough freak." He seems to go for the "shiny objects." So "personalized medicine" comes along and we hear about that. Then we hear about "neural circuits." One after another, we’ve moved from potential breakthrough to potential breakthrough as if there’s some over-riding plan, but we never quite found out what it was. All we really knew was that whatever it was, it came under the heading, "clinical neuroscience." He went to medical school and did a psychiatry residency, but he never practiced medicine and that has been apparent throughout his tenure. He has had the perspective of a recent graduate throughout his tenure at the NIMH – unseasoned by the experience of real-life medical practice. In the words of my current neighbors, "book larnin’". So I’m relieved at his leaving and immediately worried about what’s coming next.

But that’s not the only worry. He’s going to Google, a big resource that’s capable of bringing off about anything they set their mind to do. And I’m worried that Insel will point them in the direction of screening for mental illness. In my mind, that means putting more people on even more psychiatric drugs they don’t need. He’s a nut case for "the global burden of depression" and other such buzz phrases. Those ideas plus Google are a recipe for some real problems.

However this transition plays out, his replacement and his future placement are definitely things to watch very carefully…
Mickey @ 7:00 PM

time for some pushback?…

Posted on Tuesday 15 September 2015


British Medical Journal
by Khaled El Emam, Tom Jefferson, and Peter Doshi
15 Sep, 2015

The European Medicines Agency (EMA) has issued its long anticipated new policy (policy 0070) on prospective access to clinical trial data, and is now in consultations to figure out the details of its implementation. We were invited to join these ongoing consultations, and have previously reported on the debate here and here.

We have been particularly concerned about the anonymization and redactions of the content of clinical study reports (CSRs), and especially concerned about the approach proposed by some in industry.

But now we are getting really worried. Current drafts of the EMA’s evolving guidance documents for the anonymization of CSRs leave too much leeway for creative interpretation of acceptable anonymization practices, and an EMA follow-up meeting on 7 September made clear that some industry associations are pressing to apply a standard known as the TransCelerate approach. While almost all approaches sound reasonable (after all, they are intended to protect the anonymity of trial participants—a good thing), the TransCelerate redaction approach would cripple the usefulness of CSRs.

Take a look for yourselves. Figure 1 (below) is a page from a Tamiflu CSR (Research Report No. 1005291) that Roche released to us after a four year long battle for access. Figure 2 shows what would be likely to happen if Roche applied the TransCelerate redaction standard to that same document. Applying the TransCelerate approach takes the Tamiflu document and turns it into a page of black boxes. For instance, all dates relating to individual trial participants have to be redacted, as well as other patient information such as sex, age, weight, height, race, ethnicity, and socioeconomic information. All patient narratives would also have to be removed.

[see figures linked above]

Figure 1: Line listing from Tamiflu trial WV16277 (Research Report No. 1005291) redacted by Roche for public release. Available from http://dx.doi.org/10.5061/dryad.77471

Figure 2: Line listing from Tamiflu trial WV16277 (Research Report No. 1005291) redacted according to the TransCelerate guidance. Available from http://dx.doi.org/10.5061/dryad.77471

Why “likely to happen” and not “happen”? Because we had to create figure 2 ourselves. Ideally, those advocating a redaction approach would send around shared examples for the rest of us to see and discuss. But there were no clear examples at the EMA meetings.

Using redactions to assure the anonymization of data in CSRs is emerging as a make or break issue for the success of the EMA initiative. The intensive redaction of the TransCelerate approach risks nullifying most of the progress towards transparency made so far in Europe…
I kept a timeline of the EMA Data Transparency saga through this time last year [then I got busy]. It’s here for review. As you can see, the news here is bad. Ever since the AbbVie/Interimmune Suits, the cause of true data transparency has been slowly eroding away at the EMA. The initial offering was too good to be true, but it passed through the mid-point and has kept going south [see also important work… and in the details…].

It seems to me that the history of Clinical Trials of drugs is not unlike the stories told by many of our patients with personality disorders – the solution to the last problem is the beginning of the next problem. With the trials, the last reform movement creates the loophole that allows things to essentially remain as dysfunctional as they’ve always been. Right now, we’re committed to Data Transparency, and we’re now watching is be picked apart in front of our eyes. The watchdogs on the byline here are front and center on the case along with others, but they may be like the little Dutch Boy, running out of fingers to stick in the leaks.

The one bright side of this story is that the EMA has responded to a public outcry in the past [see the timeline for examples], and we may be approaching time for another all out effort…
Mickey @ 6:20 PM

study 329 v: into the courtroom…

Posted on Saturday 12 September 2015

When you read an article in a medical journal, all you have to go on is what you’re told in the article itself. If you watched Dr. Healy’s commentary [background music…],  you know that this 11 page article represents 77,000 pages of data locked away in some data archive out of sight, a compression ratio of 7,000:1! And if you question an article, there’s no real way to answer your questions without that data. In this case, because of a legal challenge in 2004, the Clinical Study Report has been available on the Internet for a long time. It’s a 528 page document used to submit the paper to regulatory agencies. Over the years, many have read it over and over and found further things to fuel our contention that the original article reached an indefensible conclusion. But all that really did was further refine suspicions. It didn’t prove a thing:
In 2012, GSK finally posted the actual data [Appendices B, C, and D] as they had agreed to do in 2004, and so the numbers were there to see. So many numbers! And the only way to analyze them would be to hand copy them into some electronic format that could be input into a statistical program for reanalysis. I had a shot at that [cataloged in the lesson of Study 329: an unfinished symphony…], but there were so many numbers! Too many. I did enough to gain the conviction that this study was as far off the mark as it appeared. But it was only when we got the raw data in an electronic format that we could really do a complete analysis like the one we are publishing. I hasten to add that the form that data came in was a real challenge – a restrictive remote desktop that made the data manipulation very difficult.
The safety analysis required more data access. The transcribed numbers in the IPD tables for the rating scales were fine for the efficacy part, but the IPD version of the Adverse Events weren’t enough. We needed to look at the actual forms filled out during the study by the blinded clinicians and raters to approach the level of nuance needed to reach any conclusions about harms.
Our article isn’t really about Paxil Study 329. People like reporters Shelley Jofre of BBC’s Panorama, Alison Bass who wrote Side Effects, or legal actions from patients and governments brought it to the fore. The courts have levied punishments and record breaking fines already. And our group has been able to add a counter to the original article in the JAACAP which still sits in our libraries un-retracted.

The broader point of our article is that physicians and the patients we advise have an absolute right to look at the raw data behind the abbreviated proxies that appear in our literature as journal articles. When we have that kind of access, the playing field is level and the profession has the necessary means to join in the kind of checks and balances system that keeps people honest. Our paper is an example of how we think that information should be presented. Further, the medical profession has an absolute obligation to do whatever it needs to do to insure that the information we pass on our patients as scientific transcends other influences – including commercial profit or the academic advancement of the authors.

It’s a paradox that many of the authors who have lent their reputations and the reputations of their universities to these jury-rigged Clinical Trials preach a gospel of evidence-based medicine. And these questionable Clinical Trial articles are certainly filled with icons representing the tools of science – graphs, tables, p-values, standard deviations, etc. But they hide the only basic scientific tool we will ever have – the carefully gathered primary observations we call data. The real evidence never makes it into the courtroom…

Mickey @ 8:00 AM