In a way, what I’m trying to do here is at cross purposes with some of my own beliefs. Short term Randomized Clinical Trials [RCTs] can tell us whether a drug has the desired medicinal properties; something about the strength of those properties; the incidence of early adverse effects; and can identify some but certainly not all serious toxicity. Long term trials, the experience of practicing clinicians, and the reports of patients taking these drugs are a much more important source of ongoing information. Our system of drug development and patenting has created a situation where the early clinical trials are way overvalued during the in-patent period [even the "good ones"], and often the medication harms are suppressed until they emerge in legal actions late in the day – after the damage has already been done. In addition, the early RCTs are industry productions, and many of them have made mincemeat of the scientific method they’re meant to represent. So I’m focusing on the efficacy testing of short term RCTs when we all know that the most important thing is the long term harms. But I base that approach on the medical principles of preventive medicine, specifically secondary prevention – early detection and rapid intervention. I know that the ultimate answer is Data Transparency and Independent Trial Analysis, but until that happens, educating as many people as possible about how to evaluate these early trials seems an essential way to intervene in the present…
The a priori Protocol:
A randomized, placebo controlled, double-blinded clinical trial [
RCT for short] is not exploratory research. It’s more in the range of product testing. And its validity rests on a formal declaration of the details of how the study will be conducted and specifically what measurements will be taken, what will be the primary [and secondary] outcome variables, and how the data will be analyzed. The phrase
a priori means
before the study begins. We can never count on the sanctity of the blinding. People peek. But we can trust a formal Protocol filed before the study starts. So step one in evaluating a study is to get the Protocol, or as close to the Protocol as possible. That information should be in the article, but if it’s unclear, you can often get at it from the information on
clinicaltrials.gov. I’ve yet to see a study where the Protocol was changed in midstream that didn’t smell like three day old fish. Likewise, if you can’t figure out what the Primary and Secondary Outcome Variables are, be suspicious of the whole enterprise.
Since we don’t have the luxury of seeing the raw data, we have to rely on Summary Data. In the case of Continuous Variables, that includes the MEAN, either the Standard Deviation or the Standard Error of the Mean, and the Number of subjects [μ, σ or sem, and n]. In groups of sufficient size [>30], those three parameters are a reasonable proxy for the datasets. If the Summary Data is provided in the article, the task of quickly vetting a Continuous Variables is straightforward. Again, if you can’t figure out what the Summary Data is, be suspicious of the whole enterprise:
-
STEP 1: The OMNIBUS Statistic:
If there are more than two groups [eg placebo+several drugs, placebo+several doses of a drug, etc], the first order of business is to test the significance of the whole dataset with an analysis of variance [often called the Omnibus Statistic]. With
Summary Data, one can use the Internet Calculator provided on John C. Pezzullo’s
statpages –
Analysis of Variance from Summary Data [see
john henry’s hammer: continuous variables II…]. If the result is not significant, that means that the group’s variances are not different from the whole dataset’s variance, and no further testing is required. The study is negative for the tested variable – end of story. Even if STEP 2 produces pairwise significance, it is meaningless. There are plenty of studies that skip this step [notably Paxil Study 329]. Off-hand, I can think of no valid reason to skip it.
-
STEP 2: SIGNIFICANCE testing:
With Summary Data, the Pairwise Comparisons [placebo vs drug1, placebo vs drug2, drug1 vs drug 2] is also straightforward. There are any number of Internet Calculators that produce the same results. I recommended
GraphPad QuickCalcs because I like the interface [see
john henry’s hammer: continuous variables I…]. The p-values are also invariably given in the article. Remember that p, the probability, is
qualitative. Comments like "almost significant", "very significant", "just barely significant" are common – and meaningless.
-
STEP 3: EFFECT SIZE testing:
The various EFFECT SIZES are mathematical indices aiming to
quantify the effect of a drug – its
strength. The most common index with Continuous Variables is
Cohen’s d – the difference between the two Means expressed as a function of the Standard Deviation – also called the
Standardized Mean Difference [
SMD]. It is usually reported with its
95% Confidence Interval. I recommended using
Psychometrica‘s
Comparison of groups with different sample size (Cohen’s d, Hedges’ g). While one uses the same parameters as those used for significance testing, the Effect Size estimates add a
quantitative dimension to the results. This is often omitted from published papers. So at the end of STEPs 1-3, you will have a table that looks like this – plenty enough information for an informed opinion:
BREXPIPRAZOLE Trials in Schizophrenia
|
STUDY |
DRUG |
MEAN |
SEM |
σ |
n |
p |
d |
lower |
upper |
anova p |
Correll et al |
placebo |
-12.01 |
1.60 |
21.35 |
178 |
– |
– |
– |
– |
0.0002 |
0.25mg |
-14.90 |
2.23 |
20.80 |
87 |
0.3 |
0.14 |
-0.120 |
0.393 |
2mg |
-20.73 |
1.55 |
20.80 |
180 |
<0.0001 |
0.41 |
0.204 |
0.623 |
4mg |
-19.65 |
1.54 |
20.55 |
178 |
0.0006 |
0.36 |
0.155 |
0.574 |
Kane et al |
placebo |
-13.53 |
1.52 |
20.39 |
180 |
– |
– |
– |
– |
0.0025 |
1mg |
-16.90 |
1.86 |
20.12 |
117 |
0.1588 |
0.166 |
-0.067 |
0.399 |
2mg |
-16.61 |
1.49 |
19.93 |
179 |
0.1488 |
0.153 |
-0.054 |
0.360 |
4mg |
-20.00 |
1.48 |
19.91 |
181 |
0.0022 |
0.321 |
0.113 |
0.529 |
-
STEP 2 and 3: SPREADSHEET shortcut:
I made a simple spreadsheet that creates this table
[except for STEP 1] just by entering the Summary Data to speeds things up a bit. It’s downloadable
here.
If the Summary Data isn’t provided, but the paper has the sample sizes and a p-value [p, n1, and n2], there is a way to extract the EFFECT SIZES. That shouldn’t happen, but it does. The procedure involves two Internet Calculators. The first converts the p-value into it’s z-score. You have to divide the p-value you’re given by two and then enter the result into John Walker’s Internet Calculator to extract the z-score – then enter that z-score and the sum of the sizes of the groups being compared into Psychometrica‘s Computation of the effect sizes d, r and η2 from χ2- and z test statistics. And out comes Cohen’s d as if by magic! The Confidence Intervals can be generated with d, n1, and n2 and a ponderous formula:
CI[95%]=d±1.96×√(((n1+n2)÷(n1×n2))+d²÷(2 ×(n1+n2)))
[see john henry’s hammer: continuous variables III…]. I expect this ponderous formula will be automated on a spreadsheet coming soon to a blog near you. Thus ends my summary of the discussion of Continuous Variables. How to interpret that table comes at the end of this series. On to the Categorical Variables…
In spite of an earlier career in hard science and a lot of statistical training and experience in a Jurassic Era, when I retired in 2003, I didn’t know how to do any of this. Worse, I didn’t even know it was there to do. I had moved in other directions and was a practicing psychotherapist by the time the IBM PC came along, or the Internet. I spent a lot of time with both, but it wasn’t doing this kind of thing. This looking at RCTs came after retirement and required a lot of catching up and pestering some very patient teachers to whom I will be eternally grateful. Back in the day, before I left Academia in the wake of the DSM-III, I had a job I loved – one I would’ve gladly done until retirement. I directed a Residency Training Program in Psychiatry. Having done three training programs myself, I know you never learn as much as you do in those first few years, and I loved being a part of that process and teaching the residents. It was a real loss to leave it. I guess I must still be at it in a way, because that’s where these posts are aimed. I’m trying to put together the collection of basic skills and simple tools that I wish I’d had in training or taught my residents. Back then, I would never have imagined that I would need to know how to spot deceit, sleight of hand, or sophisticated spin in the medical literature on a regular basis. But that’s the modern reality. And so that’s what I’m trying to work out in these posts. What does the new trainee need to learn out of the gate to prevent this kind of thing being perpetuated or repeated? Any and all help with that or suggestions appreciated…
Hey, I want to say I appreciate this series. I’m not a doc, and I don’t have any suggestions, but out of curiosity I’ve actually started to use some of this examining a few of the studies for a new drug called Addyi. I don’t have anything close to the knowledge necessary for a proper assessment (yet), but it looks like there’s some weirdness around the choice of outcome and the effect sizes are tiny. Thanks.
How to prepare residents for a lifetime of separating the wheat from the chaff? Eugene Stead at Duke liked to talk about the approach of See one, Do one, Teach one. These approaches and the online resources you have scoped out need to be in the toolkit of every residency journal club. The faculty teachers, hopefully without conflicts, can be role models for how to use these resources. The residents in turn can practice them in reviewing new publications, and the senior residents can teach the junior residents. That’s how a professional culture of rigorous evaluation can be shaped. Even better, a hands-on research exercise for the residents under the eye of a dependable faculty teacher will be the ultimate learning experience. Right now, residents and clinicians defer to the professional statisticians because they feel ignorant and overwhelmed, especially when multivariate statistics are being thrown around. These resources can empower them to challenge the “message” of corporate publications more confidently. Statistics is too important to be left to the professional statisticians. Your resources can help new learners to cut through the obfuscations of those professional statisticians who are employed by corporations to help put lipstick on the pig. Thank you for all this work!
Oooh, I really liked this:
“Back then, I would never have imagined that I would need to know how to spot deceit, sleight of hand, or sophisticated spin in the medical literature on a regular basis.”
I had never articulated it this way, but even outside of psychiatry, psychology, and medicine, these are exactly the skills that I seek to foster in my younger friends. I feel that in my generation, we placed a much higher value on the skill of knowing when we were being lied to.
(Of course, my younger friends would counter, why should they trust my generation, because we participated in making the world as deceitful and deliberately obfuscatory as it is. To which I can only say, “Guilty as charged, I did nothing useful in the ’80s and ’90s, here is my sword, slay me if you will, but hear me out first before you decide whether it might be put to better use.”)
@ Mark: Haven’t looked at that drug much, but yes– that is the feeling I started to get, like, all the time when I took Research and Writing back in 2009. “You are measuring… what? Why would you measure that for a drug that’s supposed to do this? Why would you exclude those subject– I can’t follow the train of logic supporting that decision.” And so on.
It does seem like a ridiculous drug. I wonder how many women participated in developing it. The approval history is bleak, too, just from what you can see on wiki– rejected in 2010 for not having enough impact on coprimary endpoint of increasing desire, resubmitted in 2013 and failed again… like, why did the FDA *recommend* less restrictive entry criteria? How much more about this drug do we need to know?
Addyi may be the stupidest drug ever approved by the FDA and promoted to the public.
Valeant paid a billion for the drug and 227 prescriptions were written last month.
http://www.bloomberg.com/news/articles/2015-11-17/valeant-s-newest-problem-the-female-libido-pill-isn-t-selling
It barely works at all with an approximate NT of 12 to achieve a “sexually satisfying event.
At 780 bucks a month, I’d say a couple of nights at the Four Seasons or a weekend getaway (drinks included!) is a better investment.
There’s a lot of side effects and you can’t drink at all. And the side effects of combining it with alcohol were mostly studied in men.
http://nymag.com/scienceofus/2015/08/addyis-alcohol-safety-was-tested-mostly-on-men.html
If you have a yeast infection, you’re SOL because you can’t take diflucan with it.
There is a possible risk of increased breast cancer.
I don’t get Valeant as a business model going forward, especially in light of the Shkreli fiasco.
Valeant was basically doing a Shkreli with Wellbutrin in a stealth manner:
http://www.bloomberg.com/news/articles/2016-01-08/how-valeant-tripled-prices-doubled-sales-of-flatlining-old-drug
$1400 a month vs. $30 a month for Wellbutrin XL? The drug isn’t causing as many seizures as the sticker shock.
Oh, I don’t expect Valeant will be around too much longer. It’s a very troubled company, and if they don’t go bankrupt first I think Addyi may get pulled from the market before it harms too many people. Looking at Addyi studies is mostly a matter of curiosity for me. I suspect the mischief there is much more blatant than usual which makes it a good example for practice.
Citron, a short selling firm which is basically like a financial 1boringoldman.com, weighs in on the financial shenanigans
http://www.citronresearch.com/wp-content/uploads/2015/10/Valeant-Philador-and-RandO-final-a.pdf
Basically he is saying they are reporting nonexistent earnings through a straw purchaser of product at insanely inflated prices.
Mark, are you buying puts?
Hahaha, a small number a while back. 🙂 I’m not really an investor, but I’ve been following the Valeant drama for a few months now — since before that Citron report, actually. All low stakes for me, so it’s mostly been a matter of curiosity and seeing if I’ve any knack for the market. (If you’re curious, Roddy Boyd at SIRF and John Hempton have been more reliable sources on Valeant than Citron. Andrew Left tends rather sensationalistic.)
If Jim Grant is on board with his negative opinion, that’s good enough for me. Grant has been one of the most ethical and unbiased analysts I know of.
I don’t get Ackman’s position. He sees the roll-up as a feature…history says it’s a bug.
http://www.businessinsider.com/is-valeant-an-accounting-roll-up-2015-11
In the interest of fairness, his rebuttal:
http://finance.yahoo.com/video/ackman-valeant-not-just-rollup-110500575.html
Sorry, Bill ,Valeant is not run by a Warren Buffett. Valeant is like Tyco.
P.S. note in the article that Berkshire (Munger) is saying, hell no, Valeant isn’t like Berkshire.
Yeah, there have been a few voices quietly raising concerns for over a year now. Back in 2014, Morgan Stanley told Allergan that Valeant was a “house of cards”, and I think we’re finally seeing that come to light. Even the Addyi deal looks like it could’ve been part of the Wall Street song-and-dance routine to keep the roll-up going. It would make more sense, at least. Anyone who looked carefully would realize flibanserin is nothing like a “female Viagra”.
I don’t get Ackman’s thinking either. My guess is that at a certain level of wealth it’s hard to avoid living in a bubble of yes-men. He probably got too close to management to remain objective and nobody was willing to sound the alarm.
The good news is that with all the attention on Valeant and Shkreli, I think we may see other pharma companies scared off from big price hikes on generic drugs for a while.
Well the sales figures show they dumped 1 billion down a money pit and they will never get that back…I am going to do further analysis on their balance sheet….thanks for the Chanos tip..I follow Bill Fleckenstein too and he thinks it is a train wreck