originator bias?…

Posted on Tuesday 1 September 2015

In our recent project, I had to bone up on my statistics. It was actually pretty interesting in that the statistical tests themselves haven’t changed all that much since my hard science days. But it wasn’t like riding a bicycle exactly, more like going to a class reunion where there’s an awkward start, but with a little catching up, the old familiarity returns.  While the statistics came back quickly, the implementation was all new. SPSS was unrecognizable. The newer SAS required SAS programming training. But then there’s R [just "R"], a free Open Source, command line statistical package put together by the academic community that’s a thing of great beauty. But learning the various procedures, each carrying the idiosyncrasies of its individual creator, meant going through a number of tutorials along the way.

That’s a very long introduction to this – a lot of the tutorials had examples from studies done by social psychologists. After all, who teaches the Statistics courses? Often statistics professors come from that very discipline. And over and over, working through the examples, I thought about the softness of the experiments compared to medicine [even psychiatry]. I don’t mean that disparagingly. It’s the nature of their subject matter. The study examples were kind of interesting in their own right, and I think it prepared me for this report about an article in Science [Estimating the reproducibility of psychological science] that was a major undertaking – having 100 studies from their main journals repeated by other unrelated groups and comparing the outcomes. I wasn’t as surprised as the press seemed to think I ought to be at the low reproducibility figures:
New York Times
by Benedict Carey
AUG. 27, 2015

The past several years have been bruising ones for the credibility of the social sciences. A star social psychologist was caught fabricating data, leading to more than 50 retracted papers. A top journal published a study supporting the existence of ESP that was widely criticized. The journal Science pulled a political science paper on the effect of gay canvassers on voters’ behavior because of concerns about faked data.

Now, a painstaking years long effort to reproduce 100 studies published in three leading psychology journals has found that more than half of the findings did not hold up when retested. The analysis was done by research psychologists, many of whom volunteered their time to double-check what they considered important work. Their conclusions, reported Thursday in the journal Science, have confirmed the worst fears of scientists who have long worried that the field needed a strong correction…

The vetted studies were considered part of the core knowledge by which scientists understand the dynamics of personality, relationships, learning and memory. Therapists and educators rely on such findings to help guide decisions, and the fact that so many of the studies were called into question could sow doubt in the scientific underpinnings of their work.

“I think we knew or suspected that the literature had problems, but to see it so clearly, on such a large scale — it’s unprecedented,” said Jelte Wicherts, an associate professor in the department of methodology and statistics at Tilburg University in the Netherlands…
New York Times
by Benedict Carey
AUG. 28, 2015

The field of psychology sustained a damaging blow Thursday: A new analysis found that only 36 percent of findings from almost 100 studies in the top three psychology journals held up when the original experiments were rigorously redone.

After the report was published by the journal Science, commenters on Facebook wisecracked about how “social” and “science” did not belong in the same sentence.

Yet within the field, the reception was much different. Along with pockets of disgruntlement and outrage — no one likes the tired jokes, not to mention having doubt cast on their work — there was a sense of relief. One reason, many psychologists said, is that the authors of the new report were fellow researchers, not critics. It was an inside job…
by the Open Science Collaboration
Science 349,aac4716 [2015].

Abstract:
Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.

Conclusion:
After this intensive effort to reproduce a sample of published psychological findings, how many of the effects have we established are true? Zero. And how many of the effects have we established are false? Zero. Is this a limitation of the project design? No. It is the reality of doing science, even if it is not appreciated in daily practice. Humans desire certainty, and science infrequently provides it. As much as we might wish it to be otherwise, a single study almost never provides definitive resolution for or against an effect and its explanation. The original studies examined here offered tentative evidence; the replications we conducted offered additional, confirmatory evidence. In some cases, the replications increase confidence in the reliability of the original results; in other cases, the replications suggest that more investigation is needed to establish the validity of the original findings. Scientific progress is a cumulative process of uncertainty reduction that can only succeed if science itself remains the greatest skeptic of its explanatory claims.

The present results suggest that there is room to improve reproducibility in psychology. Any temptation to interpret these results as a defeat for psychology, or science more generally, must contend with the fact that this project demonstrates science behaving as it should. Hypotheses abound that the present culture in science may be negatively affecting the reproducibility of findings. An ideological response would discount the arguments, discredit the sources, and proceed merrily along. The scientific process is not ideological. Science does not always provide comfort for what we wish to be; it confronts us with what is. Moreover, as illustrated by the Transparency and Openness Promotion [TOP] Guidelines [http://cos.io/top], the research community is taking action already to improve the quality and credibility of the scientific literature.

We conducted this project because we care deeply about the health of our discipline and believe in its promise for accumulating knowledge about human behavior that can advance the quality of the human condition. Reproducibility is central to that aim. Accumulating evidence is the scientific community’s method of self-correction and is the best available option for achieving that ultimate goal: truth.
I rearranged the frequency plots from the figure to clarify the central point. The effect sizes fell by a half and the number that were statistically significant by two thirds. I guess they expected some fall in reproducability, but nothing quite so dramatic. It’s a wake up call for their field, actually for all of us – replication being the gold standard in scientific experimentation and analysis:

Reading through this paper, I don’t think there was so much of the kind of problem we so often run into in the Clinical Trials of medication I follow in this blog. There was open sharing of protocols, materials, and methodology between the original investigators and the groups repeating the studies – no ghost writers or jury-rigged and obfuscated analyses. And yet the replication rate was still a lot lower than anticipated.

Even in this situation absent the on-purpose biases we deal with in many of the pharmaceutical trials, it seems like there’s an intrinsic bias present when someone conducts a study of their own design. I’ll bet the bias-ologists have a name for it. Looking only at the Effect Sizes, in a repeat study by a non-originator, the net strength of the effect generally falls, often precipitously – and it’s not just for the weaker studies, but to a lesser extent, all across the range:

I realize that I’m shamelessly co-opting this data for my own purposes, but I just thought it was striking that in this study-of-studies which is likely not so suffused with the on-purpose biases we’re looking for in RCTs of medications, the results of testing a pet hypothesis [or drug?] tend towards inflation [even without obvious "cheating"]. This may be a well known phenomenon that some commenter can tell us all about, but it’s not so well known by me. It’s all the more reason to be pristine in conducting a trial or experiment and in looking for independent confirmation – replication. Meta-analysis won’t correct for this kind of originator bias in that it’s usually a meta-analysis of a group of pet hypotheses…

I obviously spent some time thinking about this report. The authors seemed worried that they would discredit their discipline with this low reproducibility finding. I felt the opposite, impressed that they were examining the precision of their metrics. Because of the subjectivity of the social sciences, it felt like familiar territory to my own corner of things, psychotherapy, where confirmation is so ethereal and replication is king. But I also thought that it was a humbling reminder that our scientific evidence-based tools [our graphs, and tables, and statistics, etc.] are just crude attempts to simplify and objectify the world around us – mere proxies for the infinite variability of the nature we’re trying to understand…
  1.  
    James O'Brien, M.D.
    September 1, 2015 | 4:47 PM
     

    Actually the results are better than I thought they would be and better than you find in more objective fields such as oncology.

  2.  
    Tom
    September 2, 2015 | 10:20 AM
     

    The situation is even worse with cancer studies. To wit:

    “During a decade as head of global cancer research at Amgen, C. Glenn Begley identified 53 ‘landmark’ publications — papers in top journals, from reputable labs — for his team to reproduce. Begley sought to double-check the findings before trying to build on them for drug development. Result: 47 of the 53 could not be replicated. He described his findings in a commentary piece published on Wednesday in the journal Nature (paywalled) . … But they and others fear the phenomenon is the product of a skewed system of incentives that has academics cutting corners to further their careers.”

  3.  
    Katie Tierney Higgins RN
    September 3, 2015 | 12:07 AM
     

    ” . … But they and others fear the phenomenon is the product of a skewed system of incentives that has academics cutting corners to further their careers.”

    I think the skewed system of incentives has also led academics far afield from the purpose of their research. Rather than advancing the knowledge base of any given specialty for the purpose of improving treatments or therapies, the goals of research seem to have become skewed by our “Market Driven Medicine”
    system– which is dependent on credentialed experts to sustain it’s market value. Over the past two decades there has been a subtle shift in priorities. What is studied and how it is studied, now reflects the personal bias of academics who seem preoccupied with personal gain. As opposed to study and research that culminated in the two volume text ” Developmental Neuropsychiatry”- published in 1998 by James C. Harris M.D., Director of Developmental Neuropsychiatry, Johns Hopkins University School of Medicine. Dr. Harris made the point, I am trying to make, with this quote, at the top of the preface to his work:

    ” Anomalies when rightly studied yield rare instruction; they witness and attract attention to the operation of hidden laws or of known laws under new and unknown conditions; and so set the inquirer on new and fruitful paths of research.” -H. Maudsley, 1880

    What does he mean by, “rightly” studied? I think it is evident in Dr. Harris’ s publication; the work, collaboration, debate, analysis and synthesis of a coherent theory is the scientific process, lengthy and tedious that should occur Before publication.

    In fields that are lacking in known laws and hard science, like; psychology and psychiatry, we see repeated efforts to invent the science that supports the practices .Or rather, thousands of studies published in professional journals. Who benefits from these publications?

    I’m willing to bet that the best practitioners in psychology and psychiatry; those who have the highest rate of successful outcomes and would receive the highest praise from their patients and their patient’s significant others, are not following the treatment guidelines (and little else) that is manufactured by academic research these days. I figure that truly sterling psychologists and psychiatrists are now something of an anomaly, and if *rightly* studied, would lend credence to old school, life long study, critical thinking and attentiveness to patients. I imagine these rare birds are also fairly eclectic in their educational pursuits and overall are well rounded, happy people.

    The rest, or the majority of licensed professionals in these fields have taken creative license to redefine Science, itself, in concert with our terribly corrupt academics. When will the harm this wreaks on vulnerable people be confronted?

  4.  
    Ferrell Varner
    September 3, 2015 | 7:28 AM
     

    If the social sciences were truly “sciences” we could program our welfare state to work much better.

  5.  
    James O'Brien, M.D.
    September 3, 2015 | 10:58 AM
     

    The above two posts miss the point. It’s not the “softness” of psychology that makes the studies unreliable…in fact as Tom pointed out they are more reliable than oncology. It’s that academic fraud and sloppiness in general is so widespread. It’s so easy to go off the rails even if you’ve done something great. See Linus Pauling, Alfred Russell Wallace.

  6.  
    Katie Tierney Higgins RN
    September 4, 2015 | 12:34 PM
     

    My point about the “softness” of psychology with regard to the published studies, is not about impacting reliability, but relevancy to issues in the patient population. The studies seem to reflect a bias toward the guild interests of the profession, that is to say; to maintain their market value the academics have adopted something like a creative license to invent the disorders, conditions they can market for their respective profession to treat. In medical research, the corollary to what I am talking referring to, is treating *risk* factors for disease. By the time the longitudinal studies (i.e; The Framingham study) prove the drugs are not
    working as risk prevention and have serious adverse effect risks, the mind set of this treatment is well engrained and difficult to change.

    Taking advantage of wide acceptance of short term studies that all seem to reflect guild interests, as opposed to longer term, varied approaches to scientific investigation of anomalies , is the signature approach of academics hailing from the softest medical science, psychology/psychiatry.

    Maybe a better way of describing the distinction, is to note that what is lacking in hard science, is made up for with authoritative invention of disorders, conditions that are proffered as new science by proxy? IF the scientific method is employed, then the conclusion of the study IS science. When, in fact, nothing could be further from the truth.

    Studies reflecting the pervasive needs of patients treated according to guidelines and therapies that were born from the past 2 decades of academic research streamlining would open ” new and fruitful paths of research”.

    Would be interesting to read, in our American professional journals, the scientific evidence for a myriad of community based non-professional, humanistic interventions for people who got stuck on the conveyor belt of the latest psychology/psychiatry treatments. Would any of our academic research institutions receive grants and funding for replicating the work of “The Family Care Foundation” in Gothenburg, Sweden? See “Nine Lives, Stories of Ordinary Life therapy from Sweden” by Hanna Sundblad-Edling.

Sorry, the comment form is closed at this time.