the psychologistBritish Psychological Society17th March 2015[see also Is Science Broken? Let’s Ask Karl Popper]A lively debate was held at London’s Senate House yesterday with panellists from neuroscience and psychology discussing the question: is science broken? If so, how can we fix it? The discussion covered the replication crisis along with areas of concern regarding statistics and larger, more general problems…
Neuroskeptic, a Neuroscience, Psychology and Psychiatry researcher and blogger, gave a personal perspective on problems with science, speaking of the events which led him to lose faith in the research in the field. He said that as undergraduate students people are taught to do statistics in a very particular way, but once a person begins PhD research things change vastly. After gathering some results for his PhD research, Neuroskeptic found he had one significant result out of seven tasks performed by his participants. He said: ‘I thought back to my undergraduate days and thought “what if you do a Bonferroni correction across all the tasks?”. I got the idea that I’d suggest this to my supervisor but don’t think I ever did, I realised that just wasn’t how it was done. I was very surprised by this. I learned as an undergraduate you do a Bonferroni correction if you have multiple tasks. I started to wonder if we aren’t doing this who else isn’t doing it? I began to lose faith in research in the field.’
Neuroskeptic said he wondered whether there was a good reason that multiple comparisons correction was not used. He added: ‘I still don’t think there’s a good reason we can’t do that. We have come to the tacit decision to accept methods which we would never teach undergraduates were a statistically good idea, but we decide that we’re happy to do them ourselves. That’s how I got on the road to blogging about these issues.’
My own biostatistics and research experience was in another medical field over forty years ago, so when I began to look at the math of clinical trials, it was familiar but only just. Besides coursework, my only hands-on experience was using ANOVA to partition the variance of interactions of effects, so there was much to learn. But I do have a Bonferroni Correction story to tell from those days. During an Immunology fellowship, my clinical work was with a Rheumatology Section. Rheumatology is like Psychiatry in that there are many conditions where the etiology [cause] was and is unknown. In the 1960s, Rheumatologists were collecting large databases on every patient they saw to develop criteria for diagnoses [sound familiar?]. Databases were new, as were the mainframe computers that held the data entered with punch cards and stored on tapes. Statistics were run with home-grown Fortran programs that ran over-night [if you were lucky]. Bill Gates hadn’t yet made it to high school. Excel was something you did in sports. And correcting for multiple variables was something kind of new.
One afternoon, the statistician and clinical staff blocked out a two hour conference to show us the results from the clinical database they were collecting [with great pride]. It was one of those after-lunch conferences where the eyelids are hard to hold open. Towards the end, the statistician showed us a thick stack of computer print outs with all the significant findings – disorders across the top, parameters down the side, cells filled with probabilities. Then he said something like, "Of course we had to correct the statistics for multiple measurements." I don’t remember the term Bonferroni Correction, but I do remember what he did. He divided all those p-values by the number of things measured, and then he showed a slide of what significance remained from that thick stack of printouts. It evaporated, and left a table that fit on one readable slide. I was pretty impressed, but he seemed deflated watching his fine p-values go up in smoke.
"Dependent t tests were used to analyse changes in outcome measures for the normally distributed variables; non-parametric analyses using Wilcoxon’s signed ranks test were used for skewed data. Tests of significance were two-tailed, but no correction was made for multiple comparisons given that this was a feasibility study in which we were less concerned about type 1 error."[Note: A type I error is a false positive]
Mickey,
Thanks again for walking us through the math.
Steve Lucas
Gotta agree with your praise for Neuroskeptic. Rather than nom de plume, how about calling his moniker a nom de blog?
Thanks for posting this. It goes to what I have been saying that this is not just a psychiatry/pharma problem but a systematic problem in most academic research. Rigor is the enemy of logrolling and a padded CV. But as I have commented previously, archangels could be doing research and it wouldn’t matter if the study is based on DSM or bad statistical methodology. Look at internal medicine and all the issues we have had to revisit in terms of the treatment of lipidemia without heart disease/stoke and mild hypertension. Or something as basic as low fat/low carb in diet.
You can blame the journal editors as much as anybody including pharma and the KOLs.
Depression-era ‘Superstitions in Medicine’ mural by Bernard Zakheim
http://dahsm.ucsf.edu/superstitions-in-medicine-mural-by-bernard-zakheim-at-cole-hall/
Great post!
I agree with James O’Brien’s point that “this is not just a psychiatry/pharma problem but a systematic problem in most academic research.”
On the other hand, the problem is probably worse in the case of pharma research because the profit motive is added to the universal tendency of scientists (industry or academic) to want to confirm their hypothesis rather than refute it.
Financial incentive would certainly seem to supercharge confirmation bias, but I’m still kind of reeling from the implication that some disciplines are going to be more vulnerable, and the comparison between rheumatology and psychiatry.
It’s an odd coincidence that I saw my rheumatologist yesterday– he was staring at the EMR and shaking his head, saying, “Why do you get blood clots?” I pointed out that there is sort of a vague association between a positive anti-RNP (which I have) and blood clots, but that clearly didn’t satisfy him. (All the other antibody tests are negative, anticardiolipin, double-stranded DNA, blah, blah, blah.)
He seemed… well, almost angry, though not at me. And I thought, how frustrating it must be to practice in a specialty where the etiology is never completely clear. Who can live with that level of ambiguity, and for how long? At a certain point, the temptation to fixate an answer even if you didn’t have one would probably become a hazard. All of us probably know a psychiatrist who just gave up at a certain point, diagnosed most of his patients with the same thing, and prescribed most of them the same meds.
What do you guys think of Perneger’s idea that the Bonferonni correction is misleading, particularly if the multiple outcomes you’re measuring aren’t closely related? I think that’s what he’s saying, anyway. The link below is from some article he wrote in 1998. I don’t know who he is, but at first blink, he doesn’t seem to be shilling for anything.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1112991/
It really sucks getting interested in all this when I have virtually no natural aptitude for math. *sigh*