the hope diamond…

Posted on Tuesday 10 May 2016


[click image to link to her slides]

Dorothy Bishop is a Developmental Psychologist who focuses on Dyslexia and other Language Disorders. This is not an article, just the slides from a presentation she gave to the Rhodes Biomedical Association last week on the reproducibility crisis. Her slides tell a story well known to us. And the problem isn’t the science, it’s the scientists. She starts with some familiar methods used to distort findings. I’ve synopsized those opening slides:

  • Publication bias: burying negative studies
  • HARKing: Hypothesis After Results Known
  • p-hacking: trying different statistical tests on various datasets until you get the result you want
but then she talks about the various people along the way who had written about this. And her history started with Adriaan de Groot [1956].
So, on a lark, I Googled Adriaan de Groot, and there was his article full text [put there by Dorothy Bishop]…
[Translated and Annotated by Eric–Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han L. J. van der Maas]
from the Psychological Laboratory of the University of Amsterdam

Abstract:
Adrianus Dingeman de Groot [1914–2006] was one of the most influential Dutch psychologists. He became famous for his work “Thought and Choice in Chess”, but his main contribution was methodological — De Groot co-founded the Department of Psychological Methods at the University of Amsterdam [together with R. F. van Naerssen], founded one of the leading testing and assessment companies [CITO], and wrote the monograph “Methodology” that centers on the empirical-scientific cycle: observation–induction– deduction–testing–evaluation. Here we translate one of De Groot’s early articles, published in 1956 in the Dutch journal Nederlands Tijdschrift voor de Psychologie en Haar Grensgebieden . This article is more topical now than it was almost 60 years ago. De Groot stresses the difference between exploratory and confirmatory [“hypothesis testing”] research and argues that statistical inference is only sensible for the latter: “One ‘is allowed’ to apply statistical tests in exploratory research, just as long as one realizes that they do not have evidential impact”. De Groot may have also been one of the first psychologists to argue explicitly for preregistration of experiments and the associated plan of statistical analysis. The appendix provides annotations that connect De Groot’s arguments to the current-day debate on transparency and reproducibility in psychological science.
Last week, I called the publication by Jureidini, Amsterdam, and McHenry  the jewel in the crown… to metaphorically emphasize the importance of their article, which introduced subpoenaed internal corporate documents to illustrate the fraudulent underbelly of the 2004 Celexa RCT in adolescents. Well, I need an even greater superlative for the De Groot article Dorothy Bishop brings to us from a more naive time – how about the Hope Diamond? Since you’re unlikely to read the whole paper without a nudge, here’s its essence from the translator’s note in the Appendix:
Specifically, De Groot makes three important interconnected points. The first point is that exploratory analyses invalidate the standard interpretation of outcomes from hypothesis testing procedures. “Exploratory investigations differ from hypothesis testing in that the canon of the inductive method of testing is not observed, at least not in its rigid form. The researcher does take as his starting-point certain expectations, a more or less vague theoretical framework; he is indeed out to find certain kinds of relationships in his data, but these have not been antecedently formulated in the form of precisely stated «testable» hypotheses. Accordingly they cannot, in the strict sense, be put to the test.” «De Groot, 1969, p. 306». Indeed, in exploratory work: “The characteristic element of ‘trying out whether …’ is present, but in such a way that the researcher’s attitude in fact boils down to ‘let us see what we can find.’ Now what is ‘found’ — that is, selected — cannot also be tested on the same materials” «De Groot, 1969, p. 307»…

The second, related, point that De Groot makes is the pressing need to distinguish between exploratory and confirmatory «“hypothesis testing”» analyses. De Groot reiter- ated this point in his book “Methodology”: “It is of the utmost importance at all times to maintain a clear distinction between exploration and hypothesis testing . The scientific significance of results will to a large extent depend on the question whether the hypotheses involved had indeed been antecedently formulated, and could therefore be tested against genuinely new materials. Alternatively, they would, entirely or in part, have to be designated as ad hoc hypotheses, which could, emphatically, not yet be tested against ‘new’ materials.” «De Groot, 1969, p. 52» Indeed, De Groot believed that it was unethical to blur the distinction between exploratory and confirmatory work: “It is a serious offense against the social ethics of science to pass off an exploration as a genuine testing procedure. Unfortunately, this can be done quite easily by making it appear as if the hypotheses had already been formulated before the investigation started. Such misleading practices strike at the roots of ‘open’ communication among scientists.” «De Groot, 1969, p. 52». This point was later revisited by Kerr «1998» when he introduced the concept of HARKing «“Hypothesizing After the Results are Known”», as well as by Simmons et al. «2011», John et al. «2012», and Wagenmakers, Wetzels, Borsboom, and van der Maas «2011»…

The third point that De Groot makes concerns preregistration. De Groot strongly felt that in order for research to qualify as confirmatory «and, consequently, for statistical inference to be meaningful», an elaborate preregistration effort is called for: “If an investigation into certain consequences of a theory or hypothesis is to be designed as a genuine testing procedure «and not for exploration», a precise antecedent formulation must be available, which permits testable consequences to be deduced.” «De Groot, 1969, p. 69»…
My version:

  1. Randomized Clinical Trials are not research [exporatory], they’re product testing [confirmatory].
  2. The a priori Protocol defines the analysis.
Dr. Bernard Carroll‘s version was in this recent comment:
There is an obvious way to prevent that kind of data manipulation, cherry picking, moving of goalposts, HARKing, and glossing over of adverse events. All it would take is for the FDA to require that they analyze the data strictly according to the a priori protocol. That requirement would apply to any investigational new drug or to any approved drug being tested for a new indication. Corporations and investigators would be prohibited from reporting any analyses other than the FDA analyses. With an a priori protocol and plan of analysis there should be no room for self-serving “creativity” by the corporations.

As things stand, we have a Kabuki theater spectacle. The corporations don’t come clean about what they did and the FDA doesn’t call them out when agency analyses disagree with the corporate line. They may deny or delay approval, but the FDA doesn’t go to the literature like Jureidini, Amsterdam, and McHenry did here challenge the distorted corporate analyses reported in the literature.

This requirement would put an end to creative manipulation of the clinical trials literature. As Dr. Mickey often says, this is not high science but rather product testing. Thus, corporations cannot claim to be privileged for conducting the statistical analyses. We need a clinical trials equivalent of the Underwriters Laboratory. Before licensing, nobody takes the manufacturing corporation’s word for it concerning the safety and performance of X-ray machines or CT scanners or cardiac defibrillators. Why should we treat drugs any differently?…
And Dr. Adriaan de Groot said it back when the world was young [1956]. It’s the Hope Diamond because it’s our only hope to stop the craziness of the last thirty five years. It’s probably more important than Data Transparency, rigid enforcement of analysis that follows the registered a priori protocol…
hat tip to Dorothy Bishop…  
  1.  
    Joseph Arpaia
    May 11, 2016 | 10:47 PM
     

    This is critically important but I am not sure that most people have an intuitive understanding as to why.

    Almost any event is highly improbable. A car parked on my street has license plate 267 OVG. The probability of a car having that license plate is <0.0000006. Highly statistically significant. It must mean something.

    Awhile back I had a string of new women patients with the first name of Jennifer. The distribution of women's names became quite skewed to the point where over 10% of my women patients were named Jennifer. This is highly unlikely and I probably could have published an article on how naming one's daughter Jennifer predisposed her to having a psychiatric disorder.

    This is a little different than the correlation=causation fallacy. It is the fact that while a single, specific unlikely event is unlikely, an unlikely event is actually quite likely to occur.

    Thanks for the link to Dr. de Groot's work. If only that could become more widely understood.

Sorry, the comment form is closed at this time.