a painful re·al·i·za·tion…

Posted on Friday 25 September 2015

    re·al·ize  /rëà,liz/
      become fully aware of [something] as a fact; understand clearly
[I’ve preferred to think of to re·al·ize as "to make real"]

I had no conscious intention of parsing the verbs to ra·tion·al·ize and to re·al·ize in back to back posts. I wrote the last post, then read today’s installment of America’s Most Admired Lawbreaker, the serialized story of Alex Gorsky, Johnson & Johnson, and Risperdal in the Huffington Post [now the 11th of 15 daily chapters]. The story has a specific meaning for me, and so reading it isn’t just informative – it’s the stimulus for a lot of memories, some of which were painful. I don’t know if it’s this way for everyone, but medical training had a massive impact on my relationship with my own mind. If you make a medical error, it can have the gravest of consequences – and it’s inconceivable that one won’t make mistakes. So when it happens, you remember what you were thinking while making the error and are faced with the dramatic consequences of your wrongness.

The first real experience of clinical medicine in my medical school was being on an autopsy team as part of the Pathology Course. My initial autopsy was on a twelve year old boy who had come into the hospital with a raging case of pneumonia and within a day and a half was dead. He had been seen by every service on pediatrics, but nobody knew what was wrong. So besides the pathology resident doing the autopsy and we greenhorn second years, there were pediatricians and pediatric surgeons filling the suite. The boy had an anomolous appendix that wasn’t where it was supposed to be. It had ruptured towards the back and the infection had gotten behind the abdominal lining, traveled up through the diaphragm into his chest, and presented as pneumonia. He had none of the usual symptoms of Appendicitis.

While the surgeons had considered that possibility, one of the senior residents had nixed the idea of an exploratory laparotomy. And when it became apparent during the autopsy that would have been the only thing they could’ve done that might have saved the boy, I could see his despair – a quiet depth of despair I’d never seen before. And through the years as I made my own errors, I learned what that felt like from the inside. And it seems that the more you learn, the greater the requirement to be skeptical about your own thoughts: "Am I rationalizing? comes to the fore. And in my later profession as a psychotherapist, that skepticism has to come to the center of the stage. Every thought is tentative, only a hypothesis, until proven otherwise.

So back to the thread. It can’t be lost on anyone that a psychoanalyst/psychotherapist like me would have a built in bias as a retired guy looking into the likes of biomedical psychiatry and psychopharmacology. So when I started seeing patients in a general psychiatric clinic and was appalled at the medication regimens people were on or when the disturbing articles about prominent psychiatrists reporting on Senator Grassley’s investigations started appearing, those things certainly bothered me [to say the least]. But I’m a biased observer. It looked like something was terribly wrong [and by the way, some of the names I was reading were people I knew, or knew about]. But was I ra·tion·al·iz·ing based on my own inner workings? And there was another piece. It was a painful story. This was my profession we were talking about. I’m not an anti-psychiatrist. I’m a psychiatrist and this was  feeling like a pretty painful realization.

When I ran across TMAP and the other J&J antics, I got pretty intrigued. It was so widespread, reported as almost Machiavellian. Somewhere along the way, I connected with whistle-blower Allen Jones and I read the Rothman Report, an amazing must-read document written for Allen’s TMAP trial that was coming up. Almost without thought, I asked my wife if she was up for a trip to Austin Texas [she’s usually up for a trip to anywhere new]. She said "sure," and so we were off to Austin for a week and a half trial. It was an odd impulse, and even on the plane I wondered why I was going. But by the end of the first day of the trial, I knew the answer. I was there to see if it [all the deceit I’d been reading about] was real [to re·al·ize as in "to make real"]. And it was real with a capital "R"! I guess I’m an evidenced-based type after all. Parenthetically, I think it was that same need to make it real that drove me in our recent Paxil Study 329 article.

Steven Brill’s series [America’s Most Admired Lawbreaker] is really top notch – actually also a must read. But a lot of it is about the chess moves by the J&J lawyers and the lawyers on the various other sides. And it’s about the big guys at the top. But going to the trial, the story was populated, and it was the testimony of the witnesses in the lower ranks that made it all so very real for me. I just happen to have posted the transcripts of the whole trial indexed for easy reading right here on 1boringoldman. The main linked index is below with a few highlighted to focus your reading [don’t miss Moake or Jones]:



State v. Janssen Vol 1
State v. Janssen Vol 2
01/10/2012   Cynthia O’Keeffe The Opening Statement for the State of Texas Civil Medicaid Fraud Division.
01/10/2012   Tom Melsheimer The Opening Statement given by by Whistleblower Allen Jones’ Lawyer.
01/10/2012   Steve McConnico The Opening Statrment for the defendents – Janssen Pharmaceutica et al.
01/10/2012   Thomas Anderson Mr. Anderson was a Product Manager at Janssen during the time Risperdal was "launched" in 1993.
01/10/2012   Margaret Hunt Ms. Hunt is a fraud investigator for the Civil Medicaid Fraud Division of the Texas Attorney General’s Office.
State v. Janssen Vol 3
01/11/2012   Alexander Miller Dr. Miller is in the Department of Psychiatry at the San Antonio Texas Health Science Center – a member of the TMAP team.
01/11/2012   Steven Shon Dr. Shon was Medical Director of the Texas Department of Mental Health Mental Retardation – an integral part of TMAP.
01/11/2012   Gary Leech A Janssen employee who was the medical science liaison for Texas, Oklahoma, Arkansas, Louisiana, and New Mexico[95 – 03].
01/11/2012   James Van Norman Dr. Van Norman is a public psychiatrist currently with Austin Travis County Integral Care, a community mental health center.
State v. Janssen Vol 4
01/12/2012   N. Bursch-Smith Janssen employee from the Department of Reimbursement Management.
01/12/2012   Bill Struyk Former Janssen employee from the Department of Reimbursement Management [1996-1997].
01/12/2012   Allen Jones Pennsylvania Investigator who blew the whistle on TMAP and filed this suit.
01/12/2012   Laurie Snyder Janssen employee in the Department of Public Health Systems & Reimbursement management.
01/12/2012   Susan Stone Dr. Stone worked at the TDMHMR at the time the Texas Medication Algorithm Project [TMAP] was started.
01/12/2012   Steven Schroeder He is the president of the Robert Wood Johnson Foundation and CEO of the Robert Wood Johnson Foundation.
01/12/2012   Percy Coard Janssen employee who was a District Manager for hospital sales and later a Public Health Systems & Reimbursement manager.
State v. Janssen Vol 5
01/13/2012   Arnold Friede An expert witness from New York testifying for the plaintiff, specializing in FDA Law.
State v. Janssen Vol 6
01/17/2012   Tiffany Moake Ms. Moake was a Sales Rep for Janssen from 2002-2004 in the San Antionio area.
01/17/2012   Shane Scott Mr. Scott was a Janssen employee and was Ms. Moake’s District Sales Manager.
01/17/2012   Bruce Perry Dr. Perry was an expert witness for the Plaintiffs – a Child Psychiatrist with Baylor Medical School.
01/17/2012   Tone Jones Mr. Jones was Janssen’s District Sales Manager for the Houston area.
State v. Janssen Vol 7
01/18/2012   Tone Jones [continued]
01/18/2012   Billy Milwee Dr. Milwee is in charge of the Texas Medicaid Formulary Program.
01/18/2012   Valerie Robinson Dr. Robinson worked as a Child Psychiatrist in Fort Worth TX, working with Foster Children.
01/18/2012   Sharon Dott Dr. Dott is a psychiatrist in the Galveston area working in public facilities.
01/18/2012   Scott Reines Dr. Reines is an MD/PhD Janssen scientist who was in charge of Clinical Trials and FDA submissions.
01/18/2012   Jos. Glennmullen Dr. Glenmullen was an expert witness for the plaintiff – on the faculty of Harvard University.
Mickey @ 3:00 PM

a creative ra·tion·al·i·za·tion…

Posted on Friday 25 September 2015

    ra·tion·al·ize  /raSHne,liz/
      attempt to explain or justify [one’s own or another’s behavior or attitude] with logical, plausible reasons, even if these are not true or appropriate
[I’ve preferred to think of to ra·tion·al·ize as "to start with a conclusion"]
by Anne R. Cappola and Garret A.
September 24, 2015

The primary interest of the biomedical scientific endeavor is to benefit patients and society. Frequently, this primary interest coincides with secondary interests, most commonly financial in nature, at the interface of the investigator’s relationship with a private sponsor, typically a drug or device company or, increasingly, venture capital firms. Academia and the public have become sensitive to how such a secondary interest might be unduly influential, biasing the interpretation of results, exposing patients to harm, and damaging the reputation of an institution and investigator. This concern has prompted efforts to minimize or “manage” such “conflicts of interest” resulting in a plethora of policies at both the local and national level. Although these policies are often developed in reaction to a limited number of investigators, once introduced, they apply to all. Given the broad array of stakeholders, the diversity of approaches, and the concern that such policies might restrain innovation and delay translation of basic discoveries to clinical benefit, the Institute for Translational Medicine and Therapeutics at the University of Pennsylvania recently convened an international meeting on conflict of interest. Several themes emerged…
Well, since we know where this is headed, why not jump on ahead right off the bat and get the conclusion out of the way?…
Conflicts of Interest:
    Dr Cappola
    • reports receiving consulting fees from Biomarin, Mannkind Corporation, and Takeda.
    Dr FitzGerald
    • reports being the McNeil Professor of Translational Medicine and Therapeutics, a council member of the American Association for the Advancement of Science, and a member of the National Academy of Medicine biomarker committee;
    • receiving a stipend for being co-chair of the advisory board for Science Translational Medicine;
    • grants from the Harrington Family Foundation and Eli Lilly;
    • consulting fees from Calico and Pfizer, Eli Lilly, Glenmark Pharmaceuticals, and New Haven Pharmaceuticals;
    • serving as chair for the Burroughs Wellcome Foundation review group on regulatory science awards, the Helmholtz Foundation advisory board for the network of cardiovascular science centers, and the PhD program committee of the Wellcome Trust, a section committee of the Royal Society;
    • and serving on the advisory boards of the Oklahoma Medical Research Foundation and King’s Health Partners in London. He also serves on the advisory boards of the Clinical and Translational Science Awards held by the University of Connecticut, Harvard, the Medical University of South Carolina, Duke University, and the University of California at San Francisco.
    • This work is supported by a grant [UL1 TR000003] from the National Institutes of Health.
Now that the authors’ cards are on the table, we can actually savor the argument that follows which deserves the label a creative ra·tion·al·i·za·tion:
First, the term conflict of interest is pejorative. It is confrontational and presumptive of inappropriate behavior. Rather, the focus should be on the objective, which is to align secondary interests with the primary objective of the endeavor—to benefit patients and society — in a way that minimizes the risk of bias. A better term — indicative of the objective — would be confluence of interest, implying an alignment of primary and secondary interests. In this regard, the individuals and entities liable to bias extend far beyond the investigator and the sponsor; they include departments, research institutes, and universities. The potential for bias also extends to nonprofit funders, such as the National Institutes of Health and foundations, as well as to journals that might, for example, generate advertising revenue from sponsors…
A conflict of interest implies bias. That’s what the term means. So the authors say that for it to have that meaning makes it a biased term – perjorative. The solution is simple. Remove the bias from the bias, and call it a confluence of interest. Now it’s not perjorative anymore. In fact, it’s downright laudable. And so bias isn’t bias after all, and everything is all better.

In 1934, Anna Freud wrote the Ego and the Mechanisms of Defense to flesh out her father’s ideas about how the mind gets around unsavory motives. She devoted a whole chapter to intellectualization and rationalization, a favorite of adolescents. Another way to look at it is that there is a cognitive leap in adolescence [Piaget] when the child can finally use formal logic and can think in the same way as his/her parents [stripping them of the power of a superior intellect]. A smart adolescent can justify [rationalize] anything and delights in endless mind games to the consternation of parents through the ages. For a time, it’s a new tool to get what you want, or enter into the power struggle phase of growing up rather than a tool for understanding. And some people never make that latter jump and rationalize their way through life.

Ms. Freud could’ve used this totally silly article as her prime example…
Mickey @ 9:44 AM


Posted on Thursday 24 September 2015

I’m not old enough to have been around during the days of Bromides [Nervine], or Barbiturates, or Meprobamate [Miltown], or Methaqualone [Quaalude]. I grew up in the age of Benzodiazepine [Librium, Valium, Klonopin, Xanax]. We all know what they do so we don’t have to have any clinical trials. We all know they’re effective short term for anxiety and we all ought to know what’s up ahead with longer term [or even medium term] use. These are the "damned if you do and damned if you don’t" drugs and the skill of the everyday clinician can be partially gauged by his/her ability to use them [or not use them] effectively without causing future problems. Some say never use them. Others ignore the warnings. But this post isn’t about that. It’s about something else:
    She was brought to the clinic by her aunt who was taking care of her temporarily. She was a woman in her fifties with a cast on her lower leg from a fall. She was calm, alert, but couldn’t answer many questions. She was blitzed. She told me she’d fallen and broken her hip. But she knew neither the date nor the season. By history, she was obviously the ‘black sheep’ of the family – a failed marriage, no contact with her kids, psych hospitalizations, multiple rehabs for alcohol, benzodiazepine detox, etc. – moving from family member to family member. Her aunt had a piece of paper with her medications written out neatly:
    • Seroquel 600 mg/day
    • Trazadone 450 mg/day
    • Depakote 2.5 Grams/day
    • Neurontin [I forget how much too much]/day
    • Cogentin 4 mg/day
      among other things…
    …an outrageous cocktail! I can think of no medical/psychiatric condition where that’s an appropriate regimen. No wonder she fell and broke her leg. No wonder that she got her injury wrong. Little wonder that she didn’t know the season [I’m surprised she even knew her name]. Where does one even start? So I saw her at the end of each day I was in the clinic, and tried to figure out what I could get away with coming down on without precipitating some withdrawal state. Over a couple of months, I got her down to…
    • Seroquel 200 mg/day
    • Depakote 500/day
    • Cogentin 4 mg/day
    …without incident. But she was still pretty fuzzy [season "yes" – month "no"]. That was two weeks ago. I had noted her pupils were dilated every visit but  wanted to decrease the Seroquel before taking on the Cogentin. This time they were so widely dilated I could barely tell her eye color [why it wasn’t that dramatic earlier isn’t clear to me] and she complained about her vision being blurred. So I stopped the Cogentin by coming down a mg/every couple of days. Yesterday, I had stepped out to return a phone call. When I got back, the nurse had put she and her Aunt in the office because she was so agitated. She was in the middle of a full scale hyperventilation episode with carpal-pedal-spasm – throwing her glasses across the room breaking them and yelling about…well, about everything.

    It took a while to get her breathing slowed. In the barrage of things that followed  [a litany of a lifetime of woes and symptoms], I noticed that her pupils were down to size; that she was fully oriented with intact memory, past and present; and that she was mad as hell about many [if not all] things. As she calmed down, I could see that she had some subtle but none-the-less definite involuntary movements of her tongue. In addition, her legs were never totally still.

    She knew about both things: "My restless legs are back – pacing all night. I haven’t slept for four days!" "It’s that Tardive thing I get from the medicine. It comes and goes [pointing to her tongue]." So I had unmasked her Akathisia and her Buco-Lingual symptoms by dropping the neuroleptic dose and discontinuing the Cogentin too quickly. At least her cognitive apparatus was working, in fact, working overtime.

Yesterday was actually my first opportunity to take a history as she had been non compos mentis earlier. I can’t discuss it here except to say that the presumptive diagnosis is Borderline Personality Disorder. There was no evidence of a major affective or psychotic disorder. That this patient was overmedicated goes without saying. In an earlier era, overmedication might have happened with the anti·anxiety drugs. Such patients are always anxious, and when people begin to treat them with medication there is a tendency for doses to go up and up. It’s never enough. In her case, besides the pan·anxiety, she experiences the now discarded diagnostic criteria from the DSM-III – intolerance of being alone. When she’s living alone, she has great difficulty sleeping, and a lot of the overmedication has to do with that complaint. But now there’s something else. Over the years, her anxiety and insomnia have been treated with various antipsychotic medications, and she now has Akathisia and involuntary tongue movements suggesting Tardive Dyskinesia, emergent on reducing the dose and the Cogentin. I won’t know for sure for a while, but I think this might well be the kind that doesn’t go away – even if I can get her off of the Seroquel.

These patients are very difficult and are often overmedicated [and have been as long as there have been medicines] – with all the medications listed in the first paragraph. That’s a bad thing. She’s gotten medications that are used in conditions she doesn’t have [Depakote and Neurontin]. That’s a bad thing too. But this patient has been given escalating doses of antipsychotics and now she may well have signs of a permanent iatragenic neurological condition called Tardive Dyskinesia. And our literature says that’s a good idea – using Atypicals Antipsychotics in Borderline Personality Disorder – based on short-term Clinical Trials funded by industry. That’s a very bad thing, maybe a forever thing:


[see Atypicals in Borderline Personality Disorders, an anachronism…, Academic Industrial Complex II…, Academic Industrial Complex III…, and not really given the chance…] These studies came from Dr. Charles Schulz‘s Department at the University of Minnesota. Dr. Schulz has recently stepped down [or been stepped down] in the wake of the Dan Markingson affair – essentially being accused of running an industry funded Clinical Trial Mill. We know a lot about the Borderline conditions, and none of what we know would suggest to me that using these medications might be a good idea. This case is an example of why. She was on a maxi-dose to treat anxiety and insomnia giving us now two disorders to deal with.

With these patients, there is often nothing right to do. If you don’t treat the anxiety, they act out in dangerous ways. If you do treat it, they overdose or take too much and still want more. They defeat most treatments and yet they need to be treated. I’m not a bit surprised that they respond to Atypical Antipsychotics in short-term trials. But like anything in these cases, the drugs run out of juice and so up goes the dose. We know that pattern from their general response to any and all treatments. And these drugs can leave permanent sequela for no particular gain that I can see. We can do so much better than this, even with these difficult cases…
Mickey @ 7:52 PM

a breakthrough·freak…

Posted on Tuesday 22 September 2015

Tom Insel explains why he’s ready to give Silicon Valley a try.
MIT Technology Review
By Antonio Regalado
September 21, 2015

We are at a really interesting moment in time. Technology that already has had such a big impact, on entertainment and so many aspects of our lives, can really start to change health care. If you ask the question “What parts of health care can technology transform?”–mental health could be one of the biggest.

Technology can cover much of the diagnostic process because you can use sensors and collect information about behavior in an objective way. Also, a lot of the treatments for mental health are psychosocial interventions, and those can be done through a smartphone. And most importantly, it can affect the quality of care, which is a big issue, especially for psychosocial interventions.

What do you mean by treating over the phone? One of the best treatments for depression is cognitive behavior therapy. It’s building a set of skills for managing your mood. You can do it with a phone as well as face to face. A lot of people with severe depression or social phobia or PTSD don’t want to go in to see someone. This lowers the bar.

Is it possible to diagnose mental illness with a phone? I’d say you can collect information over the phone that can help people manage their own treatment. Your question rests on a paradigm that is completely shifting. The old paradigm is you go to the doctor and they write a prescription. Whether you call it a diagnosis or just identifying the issue, there is an awful lot that can be done online. There is an attachment for your smartphone than can see the tympanic membrane, and pediatricians can make a diagnosis [of ear infection] online. It’s a world where you want to get the right treatments at the right time for the right people. As a consumer, you are close to the source of the information. All of this is a different paradigm that we are moving into.

Is Alphabet’s approach to mental illness going to be primarily technological or biological? I don’t know that. We are going to explore what the opportunities are. We know their sweet spot is in data analytics. What they do really well is figure out how to analyze data. The opportunity is to take that skill and answer biological questions. What that means in terms of what projects the life science team takes on in mental health is totally undefined. Part of my move there is to figure it out.
As a medical student in the 1960s, I was in a new place and the only people I knew were other medical students. A couple of my early friends were local, had grown up in the town. Through them I met their longtime friends who weren’t in medicine. One such person was the son of a successful businessman, and his path was set for life. But, in spite of my own aversion to business, we really hit it off. One day, he explained why, and gave me a phrase that’s still with me. He casually quipped, "You’re a breakthrough·freak – just like me." I’d never thought of it that way, but it was completely on target. I read science fiction [the sciency kind]. I kept up with the latest science advances and technologies, and fantasized about where they might lead. I was in medical school as a prelude to a research career. He had casually nailed my diagnosis.

Much later, I was forced to practice medicine by being drafted into the Air Force after an Internal Medicine Residency and an NIH Research fellowship. Within a short period of time, I realized that practicing medicine was not only relevant and engaging, it got me out of my head. Did I want to do something that actually mattered, or spend my life being just a breakthrough·freak? How that had come to be and how it translated into the rest of my life is another story. But for this moment, my point is that I know a breakthrough·freak when I see one. And Tom Insel has a terminal case. I hasten to add that there’s nothing wrong with being a breakthrough·freak. Probably most breakthroughs are made by breakthrough·freaks repurposed as visionaries [and I’d bet that Google is filled to overflowing with examples].

Reading Insel’s blogs for a number of years, he bounces from thing to thing leaving a trail of projects in a string behind him. Google is actually a much better fit for a serial breakthrough·freak than the NIMH. He will likely be part of a  think·tank rather than the man in charge, and that might just work [though I’m betting there will be a bunch of pop-psychology apps coming our way]. But maybe he’ll land on something visionary after all…
Mickey @ 10:00 PM

just a note:…

Posted on Monday 21 September 2015

Just a quick note to say that if you’re reading this blog and you’re not reading Steven Brill’s America’s Most Admired Lawbreaker in the Huffington Post about Alex Gorsky, J&J, Risperidal, and related matters, you’re making a big mistake. Today was Day 7 of 15, and he’s getting to the good parts. It’s a story I know well yet I haven’t had a moment’s boredom. He’s doing a mighty fine job of telling a story every American ought to read. Don’t miss it!

Mickey @ 5:43 PM

lost its mojo…

Posted on Monday 21 September 2015

One might think that with all of the supportive media coverage our Study 329 article has received, I would be able to shake off the response from lead author, Martin Keller, reproduced from Retraction Watch in the last post [keller responds…], or his comment in The Chronicle of Higher Education:
Dr. Keller contacted The Chronicle on Wednesday to insist that the 2001 results faithfully represented the best effort of the authors at the time, and that any misrepresentation of his article to help sell Paxil was the responsibility of Glaxo. "Nothing was ever pinned on any of us," despite various trials and investigations, he said. "And when I say that, I’m not telling you we’re like the great escape artists, that we’re Houdinis and we did something wrong and we got away with the crime of the century. Don’t you think if there was really something wrong, some university or agency or something would have pinned something on us?" In what he described as his first effort to speak publicly about the matter, Dr. Keller said his critics also have financial and professional motives for amplifying criticisms, including lawyers representing Paxil plaintiffs and professors seeking their own records of journal publication…
I had a somewhat similar reaction to Dr. Jeffrey Lieberman’s comment:
“The group is a self-appointed watchdog,” Jeffrey Lieberman, chair of psychiatry at the Columbia University College of Physicians and Surgeons, told BuzzFeed News. “One wonders what the motivation is, and how objective they’re going to be.”
So I spent a day or so with a background chorus of refutations playing like a scratchy record in my mind until they played themselves out. Nobody reading this blog needs to hear them again. You could probably reproduce them yourselves, and I’ve certainly filled up enough pages saying them. After the din in my head subsided, I was left with just one clear note that I wanted to respond to. It is what has been characterized as the fallacy of an appeal to authority:
“The study authors comprised virtually all of the academic researchers studying the treatment of child depression in North America at the time”
In this case, the BY·LINE is indeed full of experts:
But in spite of Dr. Keller’s claims otherwise, reading the raft of subpoenaed documents and the depositions, the author·ity appears to have rested on the shoulders of ghostwriter Sally Laden and perhaps the last two listed authors, both of whom were GSK employees [Deposition of Sally Laden, 2007]:
    QUESTION: The document that I have marked as Exhibit seven is the final clinical report for Study 329 is that correct?
       ANSWER: Yes
    QUESTION: Is this a document that you were referring to that you got the data from?
       ANSWER: I don t recall what specific document I did receive Whether it was this was one. I mean yes this would be what I would have gotten. I don’t recall getting it.
    QUESTION: You don t recall ever receiving it but you know you got it right?
       ANSWER: Yes I got it. Yes I don’t recall receiving it.
    QUESTION: This provided you with information that you then utilized to prepare the first draft of the manuscript for Study 329?
       ANSWER: Yes
    QUESTION: Was it your responsibility alone to create the first draft of Study 329 or did you get help from some of your colleagues?
       ANSWER: I believe I created it on my own.
    QUESTION: Did Martin Keller tell you what to put in the first draft?
       ANSWER: I don’t recall. I don t think I had any conversation with him until we were, you know, afterwards.
    QUESTION: After you prepared the first manuscript?
       ANSWER: To the best of my recollection, yes.
    QUESTION: In here you list eight main outcome measures, correct?
       ANSWER: Yes
    QUESTION: And you can’t tell from reading – a reader could not distinguish which are these – whether any or all of them are primary or secondary?
       ANSWER: Correct
    QUESTION:  My question was do you know whose idea it was to not distinguish between primary and secondary efficacy measures?
       ANSWER: A reader cannot. This was a first draft, so this came straight from me. This was, I guess, my interpretation. I’m remembering this may have been my interpretation of the data.

This is only one small example of the extent to which the appeal to authority has pervaded our literature. The experts are listed on the BY·LINE, but the work that matters is produced by the sponsor. In this case, the subjects were recruited and underwent the study at the institutions of the listed authors, but the article was drafted by the sponsor and written by a paid writer. This kind of "guest" authorship is common – the experts are involved, but not in the authorship as we understand the term. There are many other examples where the sponsors had already completed the papers before even recruiting the academic "guest authors" to sign onto the BY·LINE.

Similar experts with [financial] PHARMA COI are everywhere: CME presentations; Speaker’s Bureaus; Review Articles; the list gets longer by the year. A recent NEJM series [wtf?…, wtf? for real…] argued that their longstanding policy of banning authors with these COI from review articles should change [in part because there’s a paucity of "clean" experts]. The DSM-5 Revision was done using panels of experts heavily laden with COI tainted members [must be crazy…]. Expert "panels" produced the guidelines and algorithms for the infamous TMAP scam in Texas [1999…]. It appears that we have developed a "cult" of experts [called Key Opinion Leaders by the pharmaceutical marketers].

The Nizkor Project [a study of logical fallacies] lists among the instances where an appeal to authority is considered a logical fallacy:
  • If there is evidence that a person is biased in some manner that would affect the reliability of her claims, then an Argument from Authority based on that person is likely to be fallacious. Even if the claim is actually true, the fact that the expert is biased weakens the argument. This is because there would be reason to believe that the expert might not be making the claim because he has carefully considered it using his expertise. Rather, there would be reason to believe that the claim is being made because of the expert’s bias or prejudice.
  • If a person makes a claim about some subject outside of his area(s) of expertise, then the person is not an expert in that context. Hence, the claim in question is not backed by the required degree of expertise and is not reliable. It is very important to remember that because of the vast scope of human knowledge and skill it is simply not possible for one person to be an expert on everything. Hence, experts will only be true experts in respect to certain subject areas. In most other areas they will have little or no expertise.
In this case, when we were able to directly access the Individual Participant Data [IPD] and the Case Report Forms [CRFs] using the a priori Protocol as our guide for predefined Primary and Secondary Outcomes following their stated Statistical Analysis Plan, we could not confirm their claim of efficacy or of safety. Quite the opposite. The group listed on the BY·LINE may well be experts of one sort or another, but they are neither unbiased nor experts in analyzing Clinical Trial data. The notion that one can introduce a completely new outcome analysis at the end of a Clinical Trial lasting three years, whether before or after breaking the blind, and expect to be taken seriously in perpetuity is ludicrous – no matter what the explanation for the change. It’s equally bizarre to query 27 outcome measures [in the CSR] and ignore correcting for multiple variables. Likewise, the idea that Dr. Keller can claim that the article was written or analyzed by the experts on the BY·LINE when the hired medical writer testifies that she wrote the first draft [and others] from an industry supplied summary is equally absurd. And our paper was actually soft on some of the statistical manipulation along the way, but these comments apply in that arena as well.

I actually sometimes feel sorry for some  of the people on that BY·LINE. For some, I expect their error was in assuming that the analysis of the study was properly conducted. But my empathy is short-lived. I think they must’ve felt like it was a double return for their efforts. By doing the study on their site, they raised money for their departments AND they added an article to their respective resumes. What they got was anything but a boost to their status as experts. They delegated their author·ity to other experts who were operating deep in the domain of fallacy – then compounded and continue to compound the problem by their silence.

The appeal to authority argument has lost its mojo, not just in this trial, but in Clinical Trials in general. In our article, we concluded:
… As with most scientific papers, Keller and colleagues convey an impression that “the data have spoken.” This authoritative stance is possible only in the absence of access to the data. When the data become accessible to others, it becomes clear that scientific authorship is provisional rather than authoritative.
Mickey @ 10:00 AM

keller responds…

Posted on Friday 18 September 2015

This is  cross posted from Dr. David Healy’s web site with his permission…

The Letter below from Marty Keller and colleagues was sent to many media outlets, to retraction watch, and to professional organizations on Wednesday.  Paul Basken from the Chronicle for Higher Education asked me for a response which I sent about an hour after receiving the letter.  This response is from me rather than the 329 group. This and other correspondence features and will feature on Study329.org.

One quick piece of housekeeping.  Restoring Study329 is not about giving Paroxetine to Adolescents – its about all drugs for all indications across medicine and for all ages.  It deals with standard Industry MO to hype benefits and hide harms.  One of the best bits of coverage of this aspect of the story yesterday was in Cosmopolitan.

Letter From Keller et al


Martin Keller

Nine of us whose names are attached to this email (we did not have time to create electronic signatures) were authors on the study originally published in 2001 in the Journal of the American Academy of Child and Adolescent Psychiatry entitled, “Efficacy of paroxetine in the treatment of adolescent major depression: a randomized controlled trial,” and have read the reanalysis of our article, which is entitled, “Restoring Study 329:  efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence”, currently embargoed for publication in the British Medical Journal (BMJ) early this week. We are providing you with a brief summary response to several of the points in that article that which with we have strong disagreement. Given the length and detail of the BMJ publication and the multitude of specific concerns we have with its approach and conclusions, we will be writing and submitting to the BMJ’s editor an in-depth letter rebutting the claims and accusations made in the article. It will take a significant amount of work to make this scholarly and thorough and do not have a time table; but that level of analysis by us far exceeds the time frame needed to give you that more comprehensive response by today.

The study was planned and designed between 1991-1992. Subject enrollment began in 1994, and was completed in 1997, at which time analysis of the data commenced.  The study authors comprised virtually all of the academic researchers studying the treatment of child depression in North America at the time. The study was designed by academic psychiatrists and adopted with very little change by GSK, who funded the study in an academic / industry partnership.  The two statisticians who helped design the study are among the most esteemed in psychiatry.  The goal of the study designers was to do the best study possible to advance the treatment of depression in youth, not primarily as a drug registration trial.  Some design issues would be made differently today — best practices methodology have changed over the ensuing 24-year interval since inception of our study.

In the interval from when we sat down to plan the study to when we approached the data analysis phase, but prior to the blind being broken, the academic authors, not the sponsor, added several additional measures of depression as secondary outcomes.  We did so because the field of pediatric-age depression had reached a consensus that the Hamilton Depression Rating Scale (our primary outcome measure) had significant limitations in assessing mood disturbance in younger patients. Accordingly, taking this into consideration, and in advance of breaking the blind, we added secondary outcome measures agreed upon by all authors of the paper.  We found statistically significant indications of efficacy in these measures. This was clearly reported in our article, as were the negative findings.

In the “BMJ-Restoring Study 329 …” reanalysis, the following statement is used to justify non-examination of a range of secondary outcome measures:

Both before and after breaking the blind, however, the sponsors made changes to the secondary outcomes as previously detailed.  We could not find any document that provided any scientific rationale for these post hoc changes and the outcomes are therefore not reported in this paper. 

This is not correct.  The secondary outcomes were decided by the authors prior to the blind being broken.  We believe now, as we did then, that the inclusion of these measures in the study and in our analysis was entirely appropriate and was clearly and fully reported in our paper.  While secondary outcome measures may be irrelevant for purposes of governmental approval of a pharmaceutical indication, they were and to this day are frequently and appropriately included in study reports even in those cases when the primary measures do not reach statistical significance.  The authors of “Restoring Study 329” state “there were no discrepancies between any of our analyses and those contained in the CSR [clinical study report]”.  In other words, the disagreement on treatment outcomes rests entirely on the arbitrary dismissal of our secondary outcome measures.

We also have areas of significant disagreement on the “Restoring Study 329” analysis of side effects (which the author’s label “harms”).   Their reanalysis uses the FDA MedDRA approach to side effect data, which was not available when our study was done.  We agree that this instrument is a meaningful advance over the approach we used at the time, which was based on the FDA’s then current COSTART approach. That one can do better reanalyzing adverse event data using refinements in approach that have accrued in the 15 years since a study’s publication is unsurprising and not a valid critique of our study as performed and presented.

A second area of disagreement (concerning the side effect data) is with their statement, “We have not undertaken statistical tests for harms.” The authors of “Restoring Study 329” with this decision are saying that we need very high and rigorous statistical standards for declaring a treatment to be beneficial but for declaring a treatment to be harmful then statistics can’t help us and whatever an individual reader thinks based on raw tabulation that looks like a harm is a harm.  Statistics of course does offer several approaches to the question of when is there a meaningful difference in the side effect rates between different groups.  There are pros and cons to the use of P values, but alternatives like confidence intervals are available.

 “Restoring Study 329” asserts that this paper was ghostwritten, citing an early publication by one of the coauthors of that article. There was absolutely nothing about the process involved in the drafting, revision, or completion of our paper that constitutes “ghostwriting”. This study was initiated by academic investigators, undertaken as an academic / industry partnership, and the resulting report was authored mainly by the academic investigators with industry collaboration.

Finally the “Restoring Study 329” authors discuss an initiative to correct publications called “restoring invisible and abandoned trials (RIAT)” (BMJ, 2013; 346-f4223).  “Restoring Study 329” states “We reanalyzed the data from Study 329 according to the RIAT recommendations” but gives no reference for a specific methodology for RIAT reanalysis.  The RIAT approach may have general “recommendations” but we find no evidence that there is a consensus on precisely how such a RIAT analysis makes the myriad decisions inherent in any reanalysis nor do we think there is any consensus in the field that would allow the authors of this reanalysis or any other potential reanalysis to definitively say they got it right.

In summary, to describe our trial as “misreported” is pejorative and wrong, both from consideration of best research practices at the time, and in terms of a retrospective from the standpoint of current best practices.

Martin B. Keller, M.D.
Boris Birmacher, M.D.
Gregory N. Clarke, Ph.D.
Graham J. Emslie, M.D.
Harold Koplewicz, M.D.
Stan Kutcher, M.D.
Neal Ryan, M.D.
William H. Sack, M.D.
Michael Strober, Ph.D.

Boxed harms


David HealyIn the case of a study designed to advance the treatment of depression in adolescents, it seems strange to have picked imipramine 200-300mg per day as a comparator, unusual to have left the continuation phase unpublished, odd to have neglected to analyse the taper phase, dangerous to have downplayed the data on suicide risks and the profile of psychiatric adverse events more generally and unfortunate to have failed to update the record in response to attempts to offer a more representative version of the study to those who write guidelines or otherwise shape treatment.

As regards the efficacy elements, the correspondence we had with GSK, which will be available on Study329.org as of  Sept 16 and on the BMJ website, indicates clearly that we made many efforts to establish the basis for introducing secondary endpoints not present in the protocol.  GSK have been unwilling or unable to provide evidence on this issue, even though the protocol states that no changes will be permitted that are not discussed with SmithKline.  We would be more than willing to post any material that Dr Keller and colleagues can provide.

Whatever about such material, it is of note that when submitting Study 329 to FDA in 2002, GSK described the study as a negative Study and FDA concurred that it was negative.  This is of interest in the light of Dr Keller’s hint that it was GSK’s interests to submit this study to regulators that led to a corruption of the process.

Several issues arise as regards harms.  First, we would love to see the ADECs coding dictionary if any of the original investigators have one.  Does anyone know whether ADECs requires suicidal events to be coded as emotional lability or was there another option?

Second, can the investigators explain why headaches were moved from classification under Body as a Whole in the Clinical Study Report to sit alongside emotional lability under a Nervous System heading in the 2001 paper?

It may be something of purist view but significance testing was originally linked to primary endpoints.  Harms are never the primary endpoint of a trial and no RCT is designed to detect harms adequately.  It is appropriate to hold a company or doctors who may be aiming to make money out of vulnerable people to a high standard when it comes to efficacy but for those interested to advance the treatment of patients with any medical condition it is not appropriate to deny the likely existence of harms on the basis of a failure to reach a significance threshold that the very process of conducting an RCT will mean cannot be met as investigators attention is systematically diverted elsewhere.

As regards RIAT methods, a key method is to stick to the protocol. A second safeguard is to audit every step taken and to this end we have attached a 61 page audit record (Appendix 1) to this paper.  An even more important method is to make the data fully available, which it will be on Study329.org.

As regards ghostwriting, I personally am happy to stick to the designation of this study as ghostwritten.  For those unversed in these issues, journal editors, medical writing companies and academic authors cling to a figleaf that if the medical writers name is mentioned somewhere, s/he is not a ghost.  But for many, the presence on the authorship line of names that have never had access to the data and who cannot stand over the claims made other than by assertion is what’s ghostly.

Having made all these points, there is a point of agreement to note.  Dr Keller and colleagues state that:

“nor do we think there is any consensus in the field that would allow the authors of this reanalysis or any other potential reanalysis to definitively say they got it right”.

We agree.  For us, this is the main point behind the article.  This is why we need access to the data.  It is only with collaborative efforts based on full access to the data that we can manage to get to a best possible interpretation but even this will be provisional rather than definitive.  Is there anything that would hold the authors of the second interpretation of these data (Keller and colleagues) back from joining with us the authors of the third interpretation in asking that the data of all trials for all treatments, across all indications, be made fully available?  Such a call would be consistent with the empirical method that was as applicable in 1991 as it is now.

David Healy
Holding Response on Behalf of RIAT 329

Mickey @ 6:01 PM

an innovative design…

Posted on Friday 18 September 2015

It has been quite a week, so I haven’t had much else on my mind outside of our own publication [Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence], but I ran across this paper and thought it was pretty interesting – focused on a topic that’s close to what we were writing about:
PLoS Medicine
by Yasmina Molero, Paul Lichtenstein, Johan Zetterqvist, Clara Hellner Gumpert, and Seena Fazel
September 15, 2015

Background: Although selective serotonin reuptake inhibitors [SSRIs] are widely prescribed, associations with violence are uncertain.
Methods and Findings: From Swedish national registers we extracted information on 856,493 individuals who were prescribed SSRIs, and subsequent violent crimes during 2006 through 2009. We used stratified Cox regression analyses to compare the rate of violent crime while individuals were prescribed these medications with the rate in the same individuals while not receiving medication. Adjustments were made for other psychotropic medications. Information on all medications was extracted from the Swedish Prescribed Drug Register, with complete national data on all dispensed medications. Information on violent crime convictions was extracted from the Swedish national crime register. Using within-individual models, there was an overall association between SSRIs and violent crime convictions [hazard ratio [HR] = 1.19, 95% CI 1.08–1.32, p < 0.001, absolute risk = 1.0%]. With age stratification, there was a significant association between SSRIs and violent crime convictions for individuals aged 15 to 24 y [HR = 1.43, 95% CI 1.19–1.73, p < 0.001, absolute risk = 3.0%]. However, there were no significant associations in those aged 25–34 y [HR = 1.20, 95% CI 0.95–1.52, p = 0.125, absolute risk = 1.6%], in those aged 35–44 y [HR = 1.06, 95% CI 0.83–1.35, p = 0.666, absolute risk = 1.2%], or in those aged 45 y or older [HR = 1.07, 95% CI 0.84–1.35, p = 0.594, absolute risk = 0.3%]. Associations in those aged 15 to 24 y were also found for violent crime arrests with preliminary investigations [HR = 1.28, 95% CI 1.16–1.41, p < 0.001], non-violent crime convictions [HR = 1.22, 95% CI 1.10–1.34, p < 0.001], non-violent crime arrests [HR = 1.13, 95% CI 1.07–1.20, p < 0.001], non-fatal injuries from accidents [HR = 1.29, 95% CI 1.22–1.36, p < 0.001], and emergency inpatient or outpatient treatment for alcohol intoxication or misuse [HR = 1.98, 95% CI 1.76–2.21, p < 0.001]. With age and sex stratification, there was a significant association between SSRIs and violent crime convictions for males aged 15 to 24 y [HR = 1.40, 95% CI 1.13–1.73, p = 0.002] and females aged 15 to 24 y [HR = 1.75, 95% CI 1.08–2.84, p = 0.023]. However, there were no significant associations in those aged 25 y or older. One important limitation is that we were unable to fully account for time-varying factors.
Conclusions: The association between SSRIs and violent crime convictions and violent crime arrests varied by age group. The increased risk we found in young people needs validation in other studies.

[cropped to fit the space]
Scandanavia has always been a special place for medical epidemiology. The countries are isolated, self contained, and they have centralized and detailed record keeping going back to the dawn of time. If you’re looking for twins adopted to different families at birth to look at nature vs nurture, head for Scandanavia. This is one of those studies – a Sweden-wide three year look at the relationship between taking SSRIs and violence. But there’s something more. They used the subjects themselves as their own controls [which struck me as a really bright thought].

Besides having access a cohort of 8+M people [~10% on SSRIs] with their prescription records and the public records of every brush with the law, they had some mighty fine computers and statisticians to have extracted their data and cross-checked so many covariates. I couldn’t possibly"vet" all of their analyses. But the core thread is that they isolated periods when patients were "on" SSRIs and when they were "off" the medication, and they compared the arrest rates for violent crimes "on" and "off" – deriving a Hazard Ratio.

While the paper deserves a careful reading, it feels like they’ve done their due diligence. There have been a ton of papers that have tried to debunk the black box warning of aggressive behavior in some adolescents on these SSRIs – and many of them focus on population studies:
  1. Gibbons RD, Hur K, Bhaumik DK, Mann JJ.
    Arch Gen Psychiatry. 2005 Feb;62(2):165-72.
  2. Gibbons RD, Hur K, Bhaumik DK, Mann JJ.
    Am J Psychiatry. 2006 Nov;163(11):1898-904.
  3. Charles B. Nemeroff, Amir Kalali, Martin B. Keller, Dennis S. Charney, Susan E. Lenderts, Elisa F. Cascade, Hugo Stephenson, and Alan F. Schatzberg
    Arch Gen Psychiatry. 2007 Apr;64(4):466-472.
  4. Nakagawa A, Grunebaum MF, Ellis SP, Oquendo MA, Kashima H, Gibbons RD, Mann JJ.
    J Clin Psychiatry. 2007 Jun;68(6):908-916.
  5. Benji T. Kurian, MD, MPH; Wayne A. Ray, PhD; Patrick G. Arbogast, PhD; D. Catherine Fuchs, MD; Judith A. Dudley, BS; William O. Cooper, MD, MPH
    JAMA: Pediatrics. 2007 Jun;161(7):690-696.
  6. Gibbons RD, Brown CH, Hur K, Marcus SM, Bhaumik DK, Mann JJ.
    Am J Psychiatry. 2007 Jul;164(7):1044-1049.
  7. Gibbons RD, Brown CH, Hur K, Marcus SM, Bhaumik DK, Erkens JA, Herings RM, Mann JJ.
    Am J Psychiatry. 2007 Sep;164(9):1356-1363.
  8. Brown CH, Wyman PA, Brinales JM, Gibbons RD.
    Int Rev Psychiatry. 2007 Dec;19(6):617-631.
  9. Gibbons RD, Segawa E, Karabatsos G, Amatya AK, Bhaumik DK, Brown CH, Kapur K, Marcus SM, Hur K, Mann JJ.
    Stat Med. 2008 May 20;27(11):1814-1833.
  10. Barry CL and Busch SH.
    Pediatrics. 2010 125[1]:88-95.
  11. Gibbons RD, Mann JJ.
    Drug Saf. 2011 May 1;34(5):375-395.
  12. Susan Busch, Ezra Golberstein, Ellen Meara
  13. Robert D. Gibbons, Hendricks Brown, Kwan Hur, John M. Davis, and J. John Mann
    Arch Gen Psychiatry. 2012 Jun;69(6):580-587.
  14. Gibbons RD, Coca Perraillon M, Hur K, Conti RM, Valuck RJ, and Brent DA
    Pharmacoepidemiologic Drug Safety. 2014 Sep 29. doi: 10.1002/pds.3713. [Epub ahead of print]
  15. Christine Y Lu, Fang Zhang , Matthew D Lakoma analyst, Jeanne M Madden, Donna Rusinak, Robert B Penfold, Gregory Simon, Brian K Ahmedani, Gregory Clarke, Enid M Hunkeler, Beth Waitzfelder, Ashli Owen-Smith, Marsha A Raebel, Rebecca Rossom, Karen J Coleman, Laurel A Copeland, Stephen B Soumerai
    BMJ. 2014 348:g3596.
    by Mark Moran
  17. by Richard A. Friedman, M.D.
    New England Journal of Medicine 2014 371:1666-1668.
  18. by Marc B. Stone, M.D.
    New England Journal of Medicine 2014 371:1668-1671.
  19. New York Times
    by Richard A. Friedman
    AUGUST 3, 2015
While I’ll readily admit that  the findings in this Swedish study fit my own ideas about this topic, I’m impressed that they did a good job in bringing an innovative design using objective measures to bear on the problem. I think it’s a study well worth looking into in more depth. Most of the articles in the list above start with a conclusion and then try to validate it [in my humble opinion]…
Mickey @ 8:00 AM

study 329 x – “it wasn’t sin – it was spin”…

Posted on Thursday 17 September 2015

[Note: the Press coverage of our article is on study329.org, but I wanted to mention the article on Retraction Watch because it has Dr. Martin Keller’s response to our paper with an argument similar to the one below…]

We know from this internal memo and position piece that the initial SKB interpretation of the efficacy results from Study 329 mirrored those reported in our RIAT article:
14 OCT 1998

Please find attached to this memo a position piece, prepared by Julie Wilson of CMAT, summarising the results of the clinical studies in Adolescent Depression.

As you will know, the results of the studies were disappointing in that we did not reach statistical significance on the primary end points and thus the data do not support a label claim for the treatment of Adolescent Depression. The possibility of obtaining a safety statement from this data was considered but rejected. The best which could have been achieved was a statement that, although safety data, was reassuring, efficacy had not been demonstrated. Consultation of the Marketing Teams via Regulatory confirmed that this would be unacceptable commercially and the decision to take no regulatory action was recently endorsed by the TAT.

As you will see from the position piece the positive trends In efficacy which were seen in Study 329 are being published as a poster at ECNP this year and a full manuscript is in development. Published references will therefore be available for the study. There are no plans to publish data from Study 377.

This report has been prepared for internal use only. Data on File summaries will be prepared and issued once the final reports from the studies have been approved. This position piece will also be available on the Seroxat/Paxil resource database.

TARGET [from the Wilson position piece mentioned above]
To effectively manage the dissemination of these data in order to minimize any potential negative commercial impact
This was, indeed, a negative study, though the published article reached the opposite conclusion [2001]:
Paroxetine is generally well tolerated and effective for major depression in adolescents.
Three  years ago, when I reviewed the exchange between Healthy Skepticism and the editor of the publishing Journal of the Academy of Child and Adolescent Psychiatry [see the lesson of Study 329: naked Emperors, fractious Queens…], I left out parts of the author’s response to the letter from Jureidini and Tonkin [2003]. This is where they attempt to explain "why" they felt justified in using the non-protocol outcomes:
This study was designed at a time when there were no randomized controlled trials showing antidepressant [tricyclic antidepressant or SSRI] superiority to placebo, so we had no prior data from which to astutely pick our outcome measures. The field has moved strongly away from using the Hamilton Rating Scale for Depression [HAM-D] in adolescent treatment studies and has gone virtually uniformly to using the Children’s Depression Rating Scale-Revised because the latter better and more reliably captures aspects of depression in youth. Surely a national regulatory body charged with approving or not approving a medication for a particular use might well simply say that if a study does not show efficacy on the primary endpoint[s[, it is a failed study and secondary outcome measures cannot then be used for approval. However, as scientists and clinicians we must adjudge whether or not the study overall found evidence of efficacy, and we do not have the convenience of falling back on such a simple rule. If we choose wrongly [in whichever direction], we don’t treat depressed children as well as the data would permit. Because we found a clear pattern of significant p values across multiple secondary analyses [recovery as assessed by HAM-D < 8, HAM-D depressed mood item, the Schedule for Affective Disorders and Schizophrenia for School-Age Children depression item, and Clinical Global Impression score at endpoint], we thought and still think this provides significant evidence of efficacy of paroxetine compared with placebo in adolescent depression. Without established reliable measures that distinguish medication responders from nonresponders at the time the study was designed, it is not surprising that the primary measures did not reach significance while other measures did. It still provides a strong “signal” for efficacy…
Creative! I expect that the comments about the CDRS-R [Children’s Depression Rating Scale-Revised] are in the vicinity of reasonable. One wonders why they didn’t say this in the first place in either the article or the Clinical Study Report. But if you take a look at several previous posts [paxil in adolescents: “five easy pieces”…, an addendum…, and follow-up…], you’ll see a definitive counter to this creative, latter day response [also apparent in this timeline]:
At the time the 329 authors wrote their response to Jon Jurieidini and Ann Tonkin in May 2003, SKB [GSK] had already completed two other Clinical Trials of Paxil in adolescents – one of them actually using the CDRS-R as a primary outcome variable. Those two studies were eventually published [after the patent for Paxil expired], but they were conducted much earlier and SKB [GSK] had the results [top figure]. When they used the CDRS, Placebo actually beat Paxil [bottom figure in yellow]. So at the time of that authors’ response letter, they justified what they’d said in Study 329 with an argument they’d already tested and already knew was a dead end [Study 701]:

using MADRS:
by Ray Berard, Regan Fong, David J. Carpenter, Christine Thomason, and Christel Wilkinson
Journal of Child and Adolescent Psychopharmacology. 2006 16[1-2]:59–75.
Conclusions: No statistically significant differences were observed for paroxetine compared with placebo on the two prospectively defined primary efficacy variables. Paroxetine at 20–40 mg/day administered over a period of up to 12 weeks was generally well tolerated.

using CDRS-R:
Journal of the American Academy of Child and Adolescent Psychiatry. 2006 45[6]:709-719.
Conclusions: Paroxetine was not shown to be more efficacious than placebo for treating pediatric major depressive disorder.
It may seem an odd way to end this particular run-on series of blog posts using a paragraph from a letter now over a decade old. But in study 329 vi: revisited…, I said, "the erroneous conclusion in Keller et al can hardly be chalked up to a mistake. It shows too many tell-tale signs of intention." That’s an opinion, my strong opinion, and I wanted to back it up with an example that didn’t just come from our reanalysis. In the very first real challenge to the article back in their 2003 letter to the JAACAP, Jon Jureidini and Ann Tonkin of Healthy Skepticism clearly saw what it has taken fourteen years of dogged persistence to finally insert into the literature in the form of our RIAT article [see the lesson of Study 329: naked Emperors, fractious Queens…]:
The article by Keller et al. [2001] is one of only two to date to show a positive response to selective serotonin reuptake inhibitors [SSRIs] in child or adolescent depression. We believe that the Keller et al. study shows evidence of distorted and unbalanced reporting that seems to have evaded the scrutiny of your editorial process. The study authors designated two primary outcome measures: change from baseline in the Hamilton Rating Scale for Depression [HAM-D] and response [set as fall in HAM-D below 8 or by 50%]. On neither of these measures did paroxetine differ significantly from placebo. Table 2 of the Keller article demonstrates that all three groups had similar changes in HAM-D total score and that the clinical significance of any differences between them would be questionable. Nowhere is this acknowledged. Instead:
  1. The definition of response is changed. As defined in the “Method” section, it has a nonsignificant p value of .11. In the “Results” section [without any explanation], the criterion for response is changed to reduction of HAM-D to below 8 [with a p value of .02]. By altering the criterion for the categorical measure of outcome, the authors are able to claim significance on a primary outcome measure.
  2. In reporting efficacy results, only “response” is indicated as a primary outcome measure, and it could be misunderstood that response was the primary outcome measure. Only in the discussion is it revealed that “Paroxetine did not separate statistically from placebo for…HAM-D total score,” without any acknowledgment that total score was one of the two primary outcome measures. The next sentence is a claim to have demonstrated efficacy for paroxetine.
Thus a study that did not show significant improvement on either of two primary outcome measures is reported as demonstrating efficacy. Given that the research was paid for by Glaxo-Smith-Klein, the makers of paroxetine, it is tempting to explain the mode of reporting as an attempt to show the drug in the most favorable light. Given the frequency with which it is cited in other scientific papers, at conferences and educational functions, and in advertising, this article may have contributed to the increased prescribing of SSRI medication to children and adolescents. We believe it is a matter of importance to public health that you acknowledge the failings of this article, so that its findings can be more realistically appraised in decision-making about the use of SSRIs in children.
With a careful reading, they saw through to the essence of what was wrong without the benefit of any of the back story, the raw data, or the numerous analyses that have followed over the years about this study. It’s a great example for all of us to emulate. Being a doctor is hard work by any standard, and we feel good about putting in all the extra time it takes to stay current. I doubt there’s any profession that can claim the "life-long-learning" moniker any more than we can. You never really graduate from medical school and there’s a never ending series of tests [AKA patients] as long as you’re in the game. So we get used to scanning, reading non-critically, in part because of the volume. But every one of us needs to learn how to recognize the signs that a given article needs to be read like Jon and Ann read this one. The modern industry sponsored Clinical Trial literature in all of medicine is filled with articles that need a long second look. Without thinking, I coined a phrase answering a reporter’s questions about our paper, "it wasn’t sin – it was spin." In the political arena, they call it plausible deniability. I don’t really believe it wasn’t sin [it may be the biggest sin of all because it’s the kind people get away with]. But the phrase still conveys a useful diagnostic take-home message to remind us what we’re on the lookout for…
Mickey @ 8:00 PM

study 329 ix – mystic statistics…

Posted on Thursday 17 September 2015

Most of us have an incomplete knowledge of Statistical Analysis unless we’ve had formal training and hands-on experience, yet we tend to accept the output from the computer’s statistical packages as if it’s dogma. In academic and commercial laboratories, we count on Statisticians [or trained SAS Programmers] to generate those abstract lettered indices that we discuss as if they’re absolutes – p, d, k, SEM, SD, NNT, OR, etc. And even the experts can’t check things with a pad and pencil. So we’re vulnerable to subtle [and even not so subtle] deceptions. In our RIAT Team’s reanalysis of Study 329, we had decided to follow the a priori protocol, which meant sticking to the protocol defined outcome variables and ignoring those later exploratory variables [in blue in Keller et al‘s Table 2] as discussed earlier.

The Study 329 protocol is clear and precise about statistical testing: parametric Analysis of Variance for the continuous variables and Logistical Regression for the categorical [yes/no] variables. They specified a model containing treatment and investigator with contingencies for interactions between them [since I’ve already put the non-stat-savvy set to sleep, I’m going to dumb this down a bit going forward]. We noticed that our p values differed from those in both the Keller et al paper and the CSR [Full study report acute], even though our open source statistical package [R] is equivalent to their commercial package [SAS] – both available in the Secure Data Portal provided by GSK. While the results for the protocol defined variables were not significant, the numbers still should’ve been close to the same. And there was something else. They were reporting statistics for Paroxetine vs Placebo, Imipramine vs Placebo, and saying that the study was not powered to test Paroxetine vs Imipramine – all pairwise comparisons. Why this was important takes a little explaining.

When a dataset has only two groups [as a study of Paroxetine vs Placebo], pairwise statistical comparisons with something like the familiar t-test are perfectly appropriate. But when you run statistical comparisons on datasets with more than two groups, there’s a two step process. First you test the whole dataset using an OMNIBUS statistical test like Analysis of Variance [ANOVA]. If the whole dataset is significant, then you can run pairwise tests between the various groups to find where the significance lies. But if the OMNIBUS test is not  significant, it means that there are no differences among the groups – and that’s the end of that. The pairwise tests are immaterial no matter how they come out. Keller et al had skipped the OMNIBUS tests altogether [never mentioned in the protocol, the paper, or the CSR]. Our results were the OMNIBUS statistics and that’s why they were different. With the protocol-defined variables under consideration, it didn’t matter since nothing was significant no matter what your method. So the question became, "Why bother to skip the OMNIBUS statistical tests?"

Since we decided to drop those non-protocol-variables because they were declared post hoc [see the last two posts], we had never run the full statistical model analysis on them. But I remembered a spreadsheet we did on a rough pass through this data when we were first getting started. The results are shown here [the OMNIBUS tests are in the far right column and all significant values are shown in red]:

The protocol-specified-variables [white background] are not significant as reported by Keller et al. But look at the non-protocol variables [gray background]. Only two were OMNIBUS-significant. And look at the columns measuring strength of effect [EFFECT SIZE, NNT, ODDS RATIO]. Except for the HAM-D DEPRESSED ITEM, those exploratory variables are pretty lame [weak]. While this was a crude first-take without considering the investigator covariate, it suggests that the OMNIBUS statistics didn’t help their cause so they were conveniently ignored. That could offer a plausible explanation for why they skipped the OMNIBUS statistical test altogether [in fact, it’s the only explanation I can think of]. Recalling that spreadsheet, I went back and ran the "full monty" model on these variables and three of them did make it under the p<0.05 wire after all: as expected, the HAM-D DEPRESSED ITEM yielded p=0.0032; but the others only barely made the cut, HAM-D REMISSION was p=0.0504, and CGI IMPROVEMENT came in at p=0.0493. Those last two were barely statistically significant, hardly seeming clinically relevant. And there was something else [see below]. The LOCF dataset for K-SADS-L was very difficult to judge since it was an every other week metric and a number of subjects got off schedule, but for what it’s worth, I could never find the reported significance with various shots at defining the LOCF dataset. Running the full model, I got p=0.0833 OMNIBUS and p=0.0662 for Paroxetine vs Placebo.

Just one more piece of techno-babble. There’s something more to say about those two minimally significant exploratory variables:

Both of the non-protocol categorical variables were only significant in week 8, suggesting to me that they were probably outliers [flukes]. And, as mentioned earlier, even if you include the rogue non-protocol exploratory variables, applying any correction for multiple variables would wipe out statistical significance for three of the four. That leaves the HAM-D DEPRESSED ITEM as the only statistically significant finding in this entire study – one question on a multi-item rating scale! So in order for Keller et al to reach the conclusion "Paroxetine is generally well tolerated and effective for major depression in adolescents," all three things had a part to play: no correction for multiple variables; redefining a priori to mean before the blind is broken rather than before the study begins; and ignoring the OMNIBUS statistical test.

I know these posts are TMI [too much information], so this is the end of all my number chatter. To my way of thinking, Study 329 has become a paradigm, emblematic of the widespread subtle distortion of the tools of scientific analysis in the service of commercial gains in the analysis of Clinical Trials. We wrote this RIAT paper to correct the existing scientific literature, but also to give the clear message that if you publish Clinical Trials that disseminate misinformation to physicians and patients, they might just be coming right back at you. And, in the future, with greater Data Transparency and awareness, it won’t take any fourteen years to make the circuit…
Mickey @ 4:00 PM