[click image to link to her slides]
Dorothy Bishop is a Developmental Psychologist who focuses on Dyslexia and other Language Disorders. This is not an article, just the slides from a presentation she gave to the Rhodes Biomedical Association last week on the reproducibility crisis. Her slides tell a story well known to us. And the problem isn’t the science, it’s the scientists. She starts with some familiar methods used to distort findings. I’ve synopsized those opening slides:
- Publication bias: burying negative studies
- HARKing: Hypothesis After Results Known
- p-hacking: trying different statistical tests on various datasets until you get the result you want
[Translated and Annotated by Eric–Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han L. J. van der Maas]from the Psychological Laboratory of the University of Amsterdam
Abstract:
Adrianus Dingeman de Groot [1914–2006] was one of the most influential Dutch psychologists. He became famous for his work “Thought and Choice in Chess”, but his main contribution was methodological — De Groot co-founded the Department of Psychological Methods at the University of Amsterdam [together with R. F. van Naerssen], founded one of the leading testing and assessment companies [CITO], and wrote the monograph “Methodology” that centers on the empirical-scientific cycle: observation–induction– deduction–testing–evaluation. Here we translate one of De Groot’s early articles, published in 1956 in the Dutch journal Nederlands Tijdschrift voor de Psychologie en Haar Grensgebieden . This article is more topical now than it was almost 60 years ago. De Groot stresses the difference between exploratory and confirmatory [“hypothesis testing”] research and argues that statistical inference is only sensible for the latter: “One ‘is allowed’ to apply statistical tests in exploratory research, just as long as one realizes that they do not have evidential impact”. De Groot may have also been one of the first psychologists to argue explicitly for preregistration of experiments and the associated plan of statistical analysis. The appendix provides annotations that connect De Groot’s arguments to the current-day debate on transparency and reproducibility in psychological science.
Specifically, De Groot makes three important interconnected points. The first point is that exploratory analyses invalidate the standard interpretation of outcomes from hypothesis testing procedures. “Exploratory investigations differ from hypothesis testing in that the canon of the inductive method of testing is not observed, at least not in its rigid form. The researcher does take as his starting-point certain expectations, a more or less vague theoretical framework; he is indeed out to find certain kinds of relationships in his data, but these have not been antecedently formulated in the form of precisely stated «testable» hypotheses. Accordingly they cannot, in the strict sense, be put to the test.” «De Groot, 1969, p. 306». Indeed, in exploratory work: “The characteristic element of ‘trying out whether …’ is present, but in such a way that the researcher’s attitude in fact boils down to ‘let us see what we can find.’ Now what is ‘found’ — that is, selected — cannot also be tested on the same materials” «De Groot, 1969, p. 307»…The second, related, point that De Groot makes is the pressing need to distinguish between exploratory and confirmatory «“hypothesis testing”» analyses. De Groot reiter- ated this point in his book “Methodology”: “It is of the utmost importance at all times to maintain a clear distinction between exploration and hypothesis testing . The scientific significance of results will to a large extent depend on the question whether the hypotheses involved had indeed been antecedently formulated, and could therefore be tested against genuinely new materials. Alternatively, they would, entirely or in part, have to be designated as ad hoc hypotheses, which could, emphatically, not yet be tested against ‘new’ materials.” «De Groot, 1969, p. 52» Indeed, De Groot believed that it was unethical to blur the distinction between exploratory and confirmatory work: “It is a serious offense against the social ethics of science to pass off an exploration as a genuine testing procedure. Unfortunately, this can be done quite easily by making it appear as if the hypotheses had already been formulated before the investigation started. Such misleading practices strike at the roots of ‘open’ communication among scientists.” «De Groot, 1969, p. 52». This point was later revisited by Kerr «1998» when he introduced the concept of HARKing «“Hypothesizing After the Results are Known”», as well as by Simmons et al. «2011», John et al. «2012», and Wagenmakers, Wetzels, Borsboom, and van der Maas «2011»…
The third point that De Groot makes concerns preregistration. De Groot strongly felt that in order for research to qualify as confirmatory «and, consequently, for statistical inference to be meaningful», an elaborate preregistration effort is called for: “If an investigation into certain consequences of a theory or hypothesis is to be designed as a genuine testing procedure «and not for exploration», a precise antecedent formulation must be available, which permits testable consequences to be deduced.” «De Groot, 1969, p. 69»…
-
Randomized Clinical Trials are not research [exporatory], they’re product testing [confirmatory].
-
The a priori Protocol defines the analysis.
There is an obvious way to prevent that kind of data manipulation, cherry picking, moving of goalposts, HARKing, and glossing over of adverse events. All it would take is for the FDA to require that they analyze the data strictly according to the a priori protocol. That requirement would apply to any investigational new drug or to any approved drug being tested for a new indication. Corporations and investigators would be prohibited from reporting any analyses other than the FDA analyses. With an a priori protocol and plan of analysis there should be no room for self-serving “creativity” by the corporations.As things stand, we have a Kabuki theater spectacle. The corporations don’t come clean about what they did and the FDA doesn’t call them out when agency analyses disagree with the corporate line. They may deny or delay approval, but the FDA doesn’t go to the literature like Jureidini, Amsterdam, and McHenry did here challenge the distorted corporate analyses reported in the literature.
This requirement would put an end to creative manipulation of the clinical trials literature. As Dr. Mickey often says, this is not high science but rather product testing. Thus, corporations cannot claim to be privileged for conducting the statistical analyses. We need a clinical trials equivalent of the Underwriters Laboratory. Before licensing, nobody takes the manufacturing corporation’s word for it concerning the safety and performance of X-ray machines or CT scanners or cardiac defibrillators. Why should we treat drugs any differently?…
This is critically important but I am not sure that most people have an intuitive understanding as to why.
Almost any event is highly improbable. A car parked on my street has license plate 267 OVG. The probability of a car having that license plate is <0.0000006. Highly statistically significant. It must mean something.
Awhile back I had a string of new women patients with the first name of Jennifer. The distribution of women's names became quite skewed to the point where over 10% of my women patients were named Jennifer. This is highly unlikely and I probably could have published an article on how naming one's daughter Jennifer predisposed her to having a psychiatric disorder.
This is a little different than the correlation=causation fallacy. It is the fact that while a single, specific unlikely event is unlikely, an unlikely event is actually quite likely to occur.
Thanks for the link to Dr. de Groot's work. If only that could become more widely understood.