We believe that the Keller et al. study shows evidence of distorted and unbalanced reporting that seems to have evaded the scrutiny of your editorial process. The study authors designated two primary outcome measures: change from baseline in the Hamilton Rating Scale for Depression [HAM-D] and response [set as fall in HAM-D below 8 or by 50%]. On neither of these measures did paroxetine differ significantly from placebo. Table 2 of the Keller article demonstrates that all three groups had similar changes in HAM-D total score and that the clinical significance of any differences between them would be questionable. Nowhere is this acknowledged. Instead:
The definition of response is changed. As defined in the “Method” section, it has a nonsignificant p value of .11. In the “Results” section [without any explanation], the criterion for response is changed to reduction of HAM-D to below 8 [with a p value of .02]. By altering the criterion for the categorical measure of outcome, the authors are able to claim significance on a primary outcome measure. In reporting efficacy results, only “response” is indicated as a primary outcome measure, and it could be misunderstood that response was the primary outcome measure. Only in the discussion is it revealed that “Paroxetine did not separate statistically from placebo for…HAM-D total score,” without any acknowledgment that total score was one of the two primary outcome measures. The next sentence is a claim to have demonstrated efficacy for paroxetine.
Thus a study that did not show significant improvement on either of two primary outcome measures is reported as demonstrating efficacy. Given that the research was paid for by Glaxo-Smith-Klein, the makers of paroxetine, it is tempting to explain the mode of reporting as an attempt to show the drug in the most favorable light…
Dr. Dulcan’s letter says, "Many highly expert child and adolescent psychiatrists were participants in that study, and others that were similar, and others equally expert contributed to the review process." And when interviewed by Panorama, she said, "We rank, and this is a worldwide ranking, we rank number one in child mental health and number two in paediatrics" and "Oh I don’t have any regrets about publishing at all. It generated all sorts of useful discussion which is the purpose of a scholarly journal."
Drs. Jureidini and Tonkin argue that the reviewers failed to understand and appropriately critique the article [and by extension that the editor was not up to the task] and that the authors of the original article swerved from their moral and scientific duty under the influence of the pharmaceutical industry. By extension, of course, they covertly argue that the reader who agrees with them is intellectually and morally superior while a reader who does not agree with their position shares the cognitive and/or moral failing of the rest of us. We say that this article and body of scientific work is a matter for thoughtful and collegial discussion and say, in addition, that their emperor has no clothes.
She took a look at my name tag, and said, "Oh, I’ve heard about you."
Since her expression was somewhere between stern and outright hostile, I queried, "In a good way or a bad way?"
"In a bad way, to tell you the truth." And then she was off on a high volume rant that went something [if memory serves] like this:
"How DARE you write an article in the New York Times saying that your therapy training at Mass General was terrible, and then later having this GREAT AWAKENING that" – she made a religious hand waving gesture – "’Oh, it’s important to understand my patients,’ and then you write an article in order to sell your new book and your newsletter. How are you any different from the drug companies? I was outraged by your article and showed it to my colleagues. What a disservice you have done to psychiatry." And it went on from there…
At the time Study 329 was published, our psychiatric literature was being flooded with pharmaceutical industry funded and managed articles turning barely significant drug effects into clinical wonder drugs using sleight of hand, the reputations of the psychiatric royalty with their prestigious universities, and our time-honored scientific journals to certify the transformations. To wit:
Study 329 is now infamous, not because it is unique for the era. It wasn’t. This kind of article was then, and is now, far too common. Study 329’s notoriety comes from the fact that it went too far in bending the science, it opportunized on the plight of children, and there were enough people who persisted in chasing the truth [Healthy Skepticism, Alison Bass, David Healy, Eliot Spitzer, Senator Grassley, Paul Thacker, many others] against the gradient of psychiatric royalty and industrial influence. The question that remains is what to do with the lessons learned…
B I W X
Mickey:
I go over a lot of this ground in my book, Side Effects: A Prosecutor, a Whistleblower and a Bestselling Antidepressant. But I’m glad you’re taking another close look at study 329 and arriving at the same conclusions that I (and Jon Jureidini and Leemon McHenry) have. Keep up the great work!
Alison
Ms. Bass,
Your book is worthy of respect, and plays a large part in why Study 329 has become such a cause celebre.
However, there are multiple issues worthy of focus in relation to this paper. As well as the question: Have the appeals for retraction and appeals to the universities led to a literature of more utility to the clinical and research community as a whole?
It is possible to believe that SSRIs can be used responsibly in children, and that there are real psychiatric side effects present with them that possibly may not rise to the level of increasing the rate of completed suicide, AND remain convinced that there needs to be a level of scholarship in the clinical literature that allows it to be a source of guidance. That allow abstracts in pubmed to be a reliable source of guidance.
Dr. Dulcan was the guardian of scholarship in the journal for 2 consecutive 5 year terms becuase AACAP appointed her as such. AACAP’s executive director in 2007, long after much of the issues regarding Study 329 and the 2001 paper had come to light, lauded her tenure. If you read carefully through their current author instructions they basically would allow for a single individual (the editor in chief) to greenlight something like the 2001 ABSTRACT to read in an identical fashion. And, for that raw study data to never see the light of day.
In terms of the tens to hundreds of thousands who rely on information derived from conclusions published in JAACAP’s abstracts (clinicians, researchers, reporters, legislators), very few of them have a fiduciary and/or career and/or power-based stake in maintaing the status quo at the journal.
That cannot be said of the leadership of AACAP (at the very least the staff involved then is the same as today), the authors on the study, the universities they belong to, the editors themselves.
The issue of the utility of the SSRIs in kids, whether they lead to increased completed suicides, what people think of the use of medications in kids, the restrictions that should be placed on prescriber-industry interactions are all important. And there will be a diversity of viewpoints among those reading JAACAP abstracts.
There will, I hope, be less of a diversity of viewpoints on the scholarship of abstracts that those tens and hundreds of thousands of people would wish for.
That Dr. Dulcan does not express deep concerns about her handling of what appeared in the 2001 JAACAP abstract, and that AACAP does not express deep concerns about her handling of it, quite apart from whether it rises to the level of scientific error or misconduct, might be of deep concern to readers.
AACAP makes the point that they will police scientific error or misconduct, but that a paper that has the conclusion that the 2001 one does in its abstract, but is likely statistically negative on 4 out of 8 secondary endpoints (7 out of 8 if corrected for as 1BOM advocates) and on 2 out of 2 primary does not rise to that level. And, if it does not rise to that level they say that falls within the discrestion of the editor in chief and is apparently not of serious concern to them. This is apparently why Keller and GSK considered them a “less demanding journal.”
It’s fair for the authors to be concerned that the high placebo response/natural improvment was drowning out the signal of what they may have believed was genuine benefit from the paroxetine. OK. Then you explain your largely negative result, explain your own conclusions, and possibly retool your approach.
Nero is said to have lost a wheel during a chariot race and gone off the track, but was awarded the race because the judges deduced that had he not lost the wheel he would have won.
The clinical registries were a valid start. But the raw data needs to eventually be made available. This paper shows the limits of peer review, particularly at JAACAP.
I know that there is a minuscule chance of this message going out in a way that would have real impact. However, those who care about using JAACAP’s abstracts as a source that is of some reliability, and not just writing them off, should care.
I have been impressed by your book and by the efforts of Healthy Skepticism, but I have not seen that result in widespread action within the clinical community.
I keep banging the drum regarding the scholarship and reliability of the information in the abstracts because I think that can result in the broadest coalition of those within the clinical community.
I believe that 1BOM is coming from a slightly different angle on this and that the slant of the conclusions may be slightly different. At least that’s my hope.
1BOM,
Regardless of all that 1BOM, your last post, “the hurdles,” was suberb.
Abstracts in JAACAP should play by a different set of rules than a Nero chariot race. If not, then at least we should eventually be able to know what happened. Mandatory relaease of the raw data a set number of years later would not change the work flow for academia, industry, or publishers, but would provide an additional check.
If not now when? If not the grassroots of AACAP, then who?
I missed the hat tip. Thank you.
Thanks Alison,
I’m covering this well traveled road because I think, perhaps naively, that we’re getting to a point where something might be done about such things. Your book, the persistence of Jureidini and McHenry, the agencies of government, watchdogs like Carroll and Rubin, POGO, and the blogs, came at a time when exposure was on the front burner and continue to exert a powerful influence now that the messages are finally beginning to be heard. It seems like now is a time for policy, and some of that is happening in the realm of Conflicts of Interest and payola [ProPublica, the Sunshine Law]. I’m of the opinion that there’s something else that needs doing – data transparency. If the raw data is available for checking, the temptations to distort it will be dramatically diminished and, if distorted, the watchdogs will be fighting on a level playing field. I can envision the counter-arguments already, but such is the nature of debate. My answer is simply, “don’t do the crime, if you can’t do the time.”
Thanks for all you’ve done and do…
Keep in mind that Dr. Dulcan would not have been able to greenlight the 2001 abstract unless some person(s) at AACAP believed her to be the best person to act as the journal’s guardian. Nor would she then have remained the journal’s guardian from 2002-2007 unless some person(s) at AACAP continued to believe this.
Likely “many expert child and adolescent child psychiatrists” were involved.
Perhaps we would get a better outcome if a greater number of expert child and adolescent child psychiatrists were involved in these decisions.
Making “the trains run on time” is important. Assuring the quality of what those trains carry is equally, if not more important.
If Study 329 and the 2001 JAACAP abstract is an example of what AACAP considers “letting the science shine through” in JAACAP then that might be of concern.
Again, any reader can read the abstract and then read the first portion of the DOJ complaint. I know physicians can be leery of legal documents, but it is a fair representation of what the documents say (ironically, a much fairer representation of the raw information than the 2001 paper).
It is hard to consider the handling of that paper an abberation when the editor in chief involved expresses no regrets and the executive director commends her on how judiciously she has picked her battles.
Dr. Dulcan spoke for all the experts in 2001 and 2002, but we came to realize that who she was referencing weren’t all truly saying the same thing.
Virginia Anthony and Andres Martin have so far spoken for AACAP and JAACAP. Are who they represent truly all saying the same thing? Are they going to say there is no need for JAACAP to require an agreement that the raw study data after some number of years be released?
At the very least, do they all say that there were no actions of concern taken by the authors and editor in this (apart from the question of scientific error or misconduct)?
“I am but a humble servant” does not appear to capture the BEHAVIOR of Dr. Dulcan, Dr. Keller, Dr. Ryan, Dr. Wagner, and Ms. Anthony. Perhaps it shouldn’t. They all in there own ways are extremely capable individuals or they would not have reached the positions they have.
So I would take exception to the idea that there is not royalty in academic medicine and societies. Whether or not there should be is debatable, but that seems to be present in some form.
Well, the community needs to decide on whether or not it wishes to rely on noblese oblige. To rely so heaviliy on interactions between “Ginger” and Dr. Dulcan, between Dr. Dulcan and “Marty,” between Dr. Keller and Dr. Ryan when he speaks how having leverage with “senior management” at GSK in relation to how safety data will be presented, between Dr. Ryan and GSK when he offers to have them vet any presentations or papers that would come out of a re-analysis of the proprietary data from Study 329, or between Dr. Ryan and SKB when he tells them he does not want other academics approached about the study in even the vaguest of terms because if he doesn’t want to greenlight their site it could cause professional hard feelings. Or any number of other examples.
This is just how the world works.
The question is how opaque do we want all of this to be?
McCafferty actually worked for GSK and looked like he was pushing for more transparent representation of the safety data. Oakes also worked for GSK and raised concerns about how the endpoints were being represented. Dr. Dulcan didn’t work for GSK and appears to have paid significantly less attention to these issues than they did, particularly in terms of the abstract.
So what good does it do us to have “transparency” mean we just get a summary of who is paying some money to whom?
You made an analogy earlier to the game CLUE in terms of solving the mystery of how the 2001 JAACAP paper came to be. Perhaps Murder on the Orient Express is a more apt analogy. A number of people had their own reasons, and likely considered (and consider them) good ones. Sadly, no author on the paper appears to have even stepped forward to raise an issue. Even Dr. Klein who would seem the one most likely to.
I suppose that would put you (and a number of others you referenced) in the role of Hercule Poirot.
Be that as it may, it is hard to argue (rather than simply declare by fiat) that “the system” worked.
A key point is that you can access the abstract for free on pubmed, but the no information from the letters comes up. It is easy to underestimate the power of the abstracts from these papers when lost in discussions of whether the definitions of emotional lability are a footnote to a table or somewhere in the text. Statements that are basically the equivalent of: It’s somewhere buried in those 6000 words in the paper, hardly seems the zenith of what we would hope for from the peer review.
The more I have read your series the more I have come to agree with you about the transparency of the data.
To some extent, researchers need to police themselves. But it seems incredibly naive to allow them to police themselves in what Study 329/the 2001 JAACAP affair shows is a much more opaque process than it purports to be. That shows how concentrated the power around these decisions truly is.
Maybe I’m wrong. Despite the cast of thousands at the different centers, and at the journal, and at GSK, maybe in the end it was simply Dr. Keller and Dr. Dulcan in the study.
But in the end, whether it was by their action, or Drs. Ryan’s and Wagner’s belief in what the data really showed, or whatever, it remains credible to believe that if the raw data was going to at some point be available (and in a form that Dr. Keller et al could not impact by “playing hardball”) that we might have gotten a more scholarly abstract and paper.
Alternatively, GSK might never have published 329, just like it had decided to do with the 2 other negative studies.
But then Dr. Wagner and the sales reps and MDs she was speaking to would not have been speaking of the impramature of a “peer reviewed” publication.
We likely would have been better served by a larger group perhaps of some less considered less expert, than by this relatively small group of our betters.
To me a huge take home is that the entire process is much less distributed and much more opaque than it purports to be.
And, that Dr. Dulcan isn’t the one who set it up this way. She was just picked and acted herself within its confines.
I am very curious to see how you address the question of what to do next ….
To whom you will address that response.
And, what you think they will likely do.
Not to be pedantic (I think the Rosa Parks story is important), but she was already sitting in the back of the bus. The rule was that if all the white seats were taken, then blacks had to give up their seat to any white who wanted it. At that moment, she could simply not get up to give that white man her seat on the bus. It wan’t at all planned, either.
Perhaps a lot of people being labeled with “anti-psychiatry” have hit that wall themselves. There is a point at which maintaining a falsehood requires more energy than resisting it, and challenging it is the only way keep one’s integrity.
Sorry, didn’t lose a wheel, he fell off.
http://www.randomhistory.com/history-of-olympic-controversies.html
“Arguably, the most famous Olympic controversy involved Roman Emperor Nero in the Games of A.D. 67. Not only did Nero bribe Olympic officials to postpone the Games by two years, he bribed his way to several Olympic laurels. Most notably, Nero competed in the chariot races with a 10-horse team, only to be thrown from his chariot. While he did not finish the race, he was still proclaimed the winner on the grounds that he would have won had he been able to complete the race. After his death the next year, his name was expunged from the victor list (Swaddling 1999). ”
But the concept stays the same.
“he was still proclaimed the winner on the grounds that he would have won had he been able to complete the race.”
The large placebo effect/natural history of recovery in the paper was unfortunate for those authors who believed that that hampered seeing the true statistical significance for paroxetine in the data. As Ryan says in the letter “as scientists and clinicians we must adjudge whether or not the study overall found evidence of efficacy.”
They may truly believe that there was a strong enough “signal of efficacy” in their data that to not honor it would have resulted in not treating depressed children as well as the data would permit. He speaks of not having the luxury of applying the “simple rules” the FDA might apply.
But this is a falacy. Perhaps one racer is better overall than another. It doesn’t mean that they should be awarded victory in every race. If we adjudge the data in our study wrongly then kids don’t get the treatment they need? Under those circumstances, if you are convinced of the need for that medication, how do ever feel comfortable presenting a negative study?
If the statistical correction for multiple secondary endpoints would really have resulted in there being only 1 statistically significant secondary endpoint, and that one was actually only established after those from the original study protocol, then that looks an awful awful lot like knowing the result of the race before you run it.
Feel free to believe that the medication should still be used. Argue why.
But call the race that was actually run.
Run it under different conditions next time. It’s a huge challenge with studies this size, but if the findings of a study can’t be largely negative and run contrary to our a priori assumptions is it science or scienciness?
Do we really want to leave the current system intact. Where, a drug has about as much chance of losing the race to placebo and tincture of time and Nero did at his games?
This does NOT mean there is not mental illness that does NOT simply respond to placebo and/or time. However, that there is does NOT mean the ONE particular treatment you are looking at is significantly more effective.
Or that if you allow the interpretation that the data in your study did not statistically show efficacy that then kids are going to get undertreated. Again, if you beileve in the drug and also believe that then how would you ever feel comfortable reporting a study as largely negative.
An emperor can believe his chariot would have run the race if he hadn’t fallen off.
An author can believe that if they could go back to the start of the study with the benefit of hindsight and add in better secondary endpoints (particulary since “established reliable measures that distinguish medication responder from non-responder at the time the study was designed” – something that may or may not have been clear in their original grant application), and perhaps stay aware that they had a high placebo response rate which they would know “real’ depression wouldn’t show, that the study ‘provides a strong “signal” for efficacy.’
The judges can say the emperor’s chariot won the race.
An editor-in-chief/board can greenlight the abstract’s conclusion stating “Paroxetine is generally well tolerated and effective for major depression in adolescents.” They can allow the abstract to state “The two primary outcome measures were endpoint response (Hamilton Rating Scale for Depression [HAM-D] score or = 50% reduction in baseline HAM-D) and change from baseline HAM-D score.” and shortly thereafter state “Paroxetine demonstrated significantly greater improvement compared with placebo in HAM-D total score < or = 8, HAM-D depressed mood item, K-SADS-L depressed mood item, and CGI score of 1 or 2." And, leave it to the reader to have to pick up on the fact that "Hey, wait a minute, HAM-D total score <or=8 lookds like one of the primary endpoints, but sort of doesn't all at the same time. Huh."
At least in the case of the emperor and the judge(s) you get to see the race.
In the case of the author and editor(s) shouldn't you someday get to see the race?
Seeing the race can be nice. If, for instance they also decide to call falling down "prolonged rider contact with large linear surface" because that's just how you code such things. They might also decide to call such prolonged contact unrelated to the chariot or rider, because … well, just because.
Would it make all the difference if the Emperor was a great rider his chariot was superb and they actually won many other races fairly? See, we were right in calling him the winner the first time. Ummm, really?
Look, one of the Paxil events coded as "suicidal" (I think, I should check, how nice that I can) was someone slapping themselves. Do I think that is on the same level as a large tylenol ingestion? No. Do I think that some of the Paxil events leave themselves open to the question of whether "suicidal" is any more representative a term than "emotional lability." Yes. Do I trust Sally Linden, Dr. Ryan, Dr. Keller, Dr. Wagner to make those calls without many others having access to the raw data. No. Even if aided by Dr. Dulcan. No.
Sorry, the quote above was:
“Without established reliable measures that distinguish medication responder from non-responders at the time the study was designed,”
Left out the without
annonymous, why does an adverse effect have to rise to the level of completed suicide to be considered serious? Completed suicide is the extreme of the adverse effects of Paxil, suicidal ideation slightly less so, akathisia a little further down the line, feeling of loss of self or demotivation more common, sexual dysfunction more common still — why are these not dealbreakers in prescribing Paxil and other antidepressants to adults as well as children? Are they not antithetical to the purpose of treatment? Why is “doesn’t cause suicide very often” a selling point?
Altostrata,
I can respect that you have strong views on the SSRIs. The relative merit of treatment with SSRIs is not my central focus. Having a primary literature that better serves the needs of readers, both in clinical and research medicine is. The pattern of Studies such as 329 and publications such as the 2001 JAACAP abstract/paper writ large simply do not adequately serve those needs. It’s tiresome to constantly have to read literature wondering: how are they trying to bamboozle us? We should not be having to seek out 1BOM to constantly ferret this out. This is what the hell peer review is supposed to be for.
“This is what the hell peer review is supposed to be for.”
And when the peers are corrupt, paid by pharma and appear to have no scruples whatsoever, (think Nemeroff et a) I agree, opening a ‘peer reviewed’ journal must feel like opening the next issue of National Enquirer for medical direction and deciphering of “studies”. If I was a doctor, I’d glean as much info from patients and patient anecdotal stories as possible, for those ARE the true long term studies, listen to them.
Anonymous, you could start your own blog if you want. I’ll host it for free for you and put a wordpress shell on it and coach you through a bit of the posting suggestions if you will iin exchange coach two more people after you get up to speed. Would probably only take you a short time to figure out the software. Lord knows you have enough content.
Mickey, I want to point out one issue with the placebo vs. Paxil chart. Although the outcomes lines are just about identical, what about when the kids come back off the Paxil? The SRRI’s have been shown to have bad withdrawal effects and to increase incidence of further relapse, so maybe the noneffect is actually worse than you are saying? If you project out past the 8 week study, then probably the Paxil group does a whole lot worse over the next 1 – 5 years.
Corinna,
I’ve never written a prescription for Paxil. Well, that’s not exactly right. I’ve refilled a few for people who needed more to continue a slow detox. Some can stop it with ease, but many can’t and no matter how slowly you taper, they still have a hard time. I learned about Paxil withdrawal from a friend’s wife who had a really hard time early on. Withdrawal can happen with any of the SSRI/SNRIs, but in my experience, Paxil and Effexor are the big culprits. That graph is theirs, not mine. I just don’t prescribe it.
Dr. Mickey, please add Pristiq and Cymbalta to your list of prime culprits for withdrawal. Should be no surprise after our experience with Effexor. I just got off the phone raking some hapless person at the FDA over the coals for the lack of dosage range in Pristiq that enables tapering. (Pristiq comes in only 2 dosages, normal and excessive, and cannot be split because it’s time-release coated.)
Although Pristiq was approved by the FDA in 2008, the FDA is still in “discussion” with Pfizer about providing other dosages. The situation has to be “studied.”
Anonymous, I also support an honest primary literature. I believe it would reveal the risks of antidepressants to be much greater than presently supposed. It would also educate clinicians in providing safer and more effective treatments with drugs and devices of all kinds.
However, we live in a era where the free market is supposed to be a corrective to bad products, the idea being that if enough people are maimed or killed by them, surviving consumers will stop buying them. The “buyer beware” principle of the free market also permits medical journals to publish garbage and let readers to sort it out.
If doctors don’t want this kind of product from journals, they need to vote as consumers. Since most are forced to subscribe via their association dues, the target of consumer outrage would be the publishers, the associations. Like stockholders, clinicians need to bring up the hard questions at annual meetings, form voting blocs, and clean the slobs out, from staff to board of directors.
Alto,
Pristiq is a bad one for withdrawal, sure enough. I had a lady who could not tolerate coming down on the dose for the reason you mentioned. So we moved to Effexor which went fine and then after two weeks, came down successfully:
It makes sense when you look at the compounds. I don’t know how generalizable it would be.
Yes, I made the suggestion to switch to Effexor in my Pristiq withdrawal tips. Was the conversion rough at all?
Annonymous, why would children be undertreated if Paxil weren’t an option? You’ve always got Prozac. Paxil is nothing special as an antidepressant, and there’s evidence it incurs more side effects than others.
Sorry, this case came before I knew about survivingantidepressants. Also, it’s interesting but I don’t really run into many folks on Pristiq or Cymbalta. And the only people I see on Paxil are people who’ve been on it a long time and haven’t been able to stop it. That’s hardly data, but it’s true…
Oh, I meant were there any problems converting the person from Pristiq to Effexor?
Alto,
Absolutely seamless. I was surprised – actually, we both were. Again, it was an n=1 clinical trial…
Thanks, Dr. Mickey.
One of my brilliant members just posted about Pristiq-Effexor equivalency here http://survivingantidepressants.org/index.php?/topic/876-tips-for-tapering-off-pristiq-desvenlafaxine/page__view__findpost__p__31850