Posted on Tuesday 26 August 2014

My last post [and wasted research dollars…] lead me to Dr. Nemeroff’s 1984 paper announcing that Cortictrophin-Releasing Factor [CRF] is significantly elevated in the CSF [cerebrospinal fluid] of patients with Major Depressive Disorder – a thirty year old observation that has figured heavily in his research since then – culminating in the clinical trial listed in the last post. It’s a short paper in Science and it’s been open on my desktop for several days. The graph haunts me when I look at it. Here’s the abstract and that one figure from the paper, followed by a description of the analytic methods from the paper:
by Nemeroff CB, Widerlöv E, Bissette G, Walléus H, Karlsson I, Eklund K, Kilts CD, Loosen PT, Vale W.
Science. 1984 Dec 14;226(4680):1342-4.

The possibility that hypersecretion of corticotropin-releasing factor (CRF) contributes to the hyperactivity of the hypothalamo-pituitary-adrenal axis observed in patients with major depression was investigated by measuring the concentration of this peptide in cerebrospinal fluid of normal healthy volunteers and in drug-free patients with DSM-III diagnoses of major depression, schizophrenia, or dementia. When compared to the controls and the other diagnostic groups, the patients with major depression showed significantly increased cerebrospinal fluid concentrations of CRF-like immunoreactivity; in 11 of the 23 depressed patients this immunoreactivity was greater than the highest value in the normal controls. These findings are concordant with the hypothesis that CRF hypersecretion is, at least in part, responsible for the hyperactivity of the hypothalamo-pituitary-adrenal axis characteristic of major depression.

"The results (see Fig. 1) were statistically analyzed by both parametric [analysis of variance (ANOVA) and Student-Newman-Keuls test] and nonparametric (Mann-Whitney U test) methods."

One of the most frustrating things about papers like this is that the raw data isn’t available, even if one has the time to go over it in detail. Once again, I find myself looking at a graph that I’m told is meaningful, significant, has something to say important about a major psychiatric syndrome. And what I see looks like a trivial difference that is probably meaningless, and I even doubt significant. So I did something that I’ve been tempted to do many times. I opened it in a graphics program and reconstituted the data by measuring the pixel count to the center of each data point and using that table, the baseline, and the ordinate scale to reproduce the data. I wouldn’t recommend this on a Nobel Prize application or even in a paper, but I thought I’d give it a shot because I don’t believe the analysis is correct, or correctly done [the next paragraphs is only for the hardy].

So armed with my little made-up table, I proceeded to the analysis. It says that they used an ANOVA. That means considering the numbers as a continuous variable. In an ANOVA with four groups, first you check the whole dataset to see if there is any significance to the grouping. If there is, then you test the groups against each other to locate the significant difference. But with a small dataset like this where the assumptions of ANOVA [normal distribution] are questionable, it is more accurate to use a non-parametric statistic that only considers the ranking of each value, not its magnitude. With four groups, the drill is the same. First one tests the whole dataset to see if the grouping in significant [Kruskal-Wallis]. If it is, you test the groups against each other to find the significant differences [Mann-Whitney]. If you read their paragraph [in italics], it’s hard to figure out exactly what they did but it looks like some steps were skipped. Here’s my version using the R statistical package.

The top green value [p = 0.007656] says that the ANOVA is significant [p<0.05]. But in the table of pairwise comparisons, it’s not the difference between NORMAL and MDD that achieves significance [p = 0.10858]. In the non-parametric test, the overall Kruskal-Wallis test of the table is not significant [0.1485]. There’s nothing there. Whether my crude method is valid or not, it sure doesn’t say this:
The CSF concentration of CRF-LI was significantly increased (by both methods of statistical analysis) in patients with major depression compared to either the normal controls or the patients with schizophrenia or senile dementia."
My point in playing this little game is that we deserve the access to raw data for this very reason. This 30 year old study has been rehashed and discussed for years and has been nuclear to several grant requests, including the Clinical Trial in the previous post. It looks like a thorough vetting thirty years ago might well have put it to rest. I can’t find further studies to confirm this finding and nothing that suggests that this compound has any solid connection with PTSD. If you haven’t figured it out yet. I think this whole line of research is based on unsubstantiated speculations.

As you may recall, when we looked at Dr. Nemeroff’s NYU Grand Rounds and London lecture to the Institute of Psychiatry, we were alerted to a study reported as positive that Dr. Nemeroff, himself, had reported as based on an error so the significance disappeared, yet he presented it as a valid study in those presentations [see has to stop…]. So the best predictor of future behavior is past behavior. Now we have GSK, the VAH, and the NIMH chasing some new drug as a treatment for PTSD based on the very shakiest of speculations. Shame on him. Shame on them. And shame on journals that don’t vet questionable studies like this.

Maybe we ought to say shame on me too for using a pixel count to get my numbers. But instead of that – why not support Data Transparency so I don’t have to resort to extreme measures to confirm my reaction to that graph. Like I said, this kind of silliness has to stop…

Whoops: [for the even more hardy] I left out this plot from the R package. The upper and lower borders of the "boxes" represent 25% and 75% of the points. The fact that the Means [bold horizontal lines] aren’t centered in the box points to a skewing of the data [not normally distributed], suggesting that the ANOVA is not the best choice of statistics, and that the non-parametric test is a more appropriate choice [Kruskal-Wallis]. My method of data capture is also more likely to be accurate using only the rank order.

    August 26, 2014 | 10:20 PM

    In my current adventure of reading about the problems with genetic variation studies, I ran across a statement that some statisticians want to be consulted before a study is designed so that they can help to ensure that the study is well designed, rather than being called in to fudge the numbers in order to show significance afterward.

    I bet pharmaceutical companies hire statisticians to make results look significant. As you say, without the raw numbers, there is no way to evaluate the study. And, I would say, that without raw numbers, they have presented no evidence, just a marketing device in the form of a graph.

    Even if it could be reliably demonstrated to be true that X chemical is present in people with Y condition, so what? Does that really mean anything? Is that sufficient evidence with which to support a conclusion or a specific course of action? Using it at a diagnostic tool is wholly unsupportable without replication, peer review, and efforts to disprove it, imo, but maybe someone here can show me otherwise.

    August 26, 2014 | 10:51 PM

    It looks to me as if this study was originally focused on Alzheimers. The principles were at an aging center and they had primarily AZ patients [n=29]. Even if their stats were right, this study should be repeated with sufficient normals and MDD patients to confirm these findings. At least that’s what would make sense to me.

    August 26, 2014 | 11:24 PM

    O.K., I see your point. What I’m wondering is how is it legitimate to conclude that there is a causal relationship there? It seems to me that the body can’t be sorted into isolated systems and X means Y relationships when it comes to the brain/mind/mood. It’s like there is a drive to oversimplify for the sake of industry and profit. And when these conclusions from studies are imposed on individuals, then we’re way too far into the weeds.

    August 27, 2014 | 6:52 PM

    Great work, Dr. Mickey! I am continually amazed by your work. And I hope Dr. Carroll adds to this discussion as he has an eagle eye for this kind of data “mis-analysis.”

    James O'Brien, M.D.
    August 28, 2014 | 9:26 AM

    Nice analysis.

    The DST fad of the eighties was a harbinger of bio uber alles hype that continues to this day.

    However, I was only partially correct in calling this a blind alley in an earlier post.

    It was a blind alley to us, but for them an eight lane highway to more grant money.

    If I were 30 year old psych residency grad today, I’d go for a PhD in quantitative psychology. There is so much nonsense out there that shouldn’t be published.

    Steve Lucas
    August 28, 2014 | 10:21 AM


    Thanks for explaining the math. I can follow along, but this is light years ahead of anything I had in any class.

    Steve Lucas

    James O'Brien, M.D.
    August 28, 2014 | 12:28 PM

    Mickey’s blog should be a CME site for physicians, especially in the field of biostatistics which is being neglected at everyone’s peril. I’m relearning ideas I forgot by 30 and learning some new concepts as well. That was an incredible analysis of a thirty year old study that somehow passed muster. I blame the editors. Just looking at the charts without doing the math should raise eyebrows.


    We’ve basically thrown tens of millions into a money pit because one subject had an outlier of 138….

    August 28, 2014 | 4:56 PM

    Why the statistics here are a bit over my head, the observation of the elevated CRH levels major depressive patients jumps out to me and the personal searching for answers I have been doing. I suffer from both depression and anxiety with minor relief from typical antidepressants. However I’ve started on Tianeptine, which seems to help restore or regulate the stress response, something that may seem relevant to the elevated CSF levels above. For you smarter folk out there, any relevance here?


    James O'Brien, M.D.
    August 28, 2014 | 7:06 PM

    I just have to ask, how did you develop such a deep understanding of statistical methodology? What was your exposure as an analyst to quantitative psych?

    August 28, 2014 | 7:38 PM

    I was a research guy in Immunology, a NIH fellowship included. I changed gears after training when I was forced to practice by being drafted. I realized that working with real people was what I was supposed to be doing. It didn’t take me long with that thought to decide that it was the people’s lives that I found so compelling, so I retrained in psychiatry and didn’t look back. I will admit to having had a lot of help from colleagues to revive my rusty stats and consider myself still a rookie…

    James O'Brien, M.D.
    August 28, 2014 | 7:47 PM

    It’s a challenge for me as well. Just so I get this right, and this intuitively makes sense to me, parametric methods are fine if the data points assume a Gaussian distribution within the groups, but if the sample size is small and there is a skewed distribution, then nonparametric methods are necessary such as K-W. Because the extreme outliers screw up averages in the first instance. Kind of like the simpler example of mean and median are the same in parametric analysis but not in nonparametric.

    Beyond that, I think the sample sizes are way to small anyway…

    August 29, 2014 | 1:12 AM

    In the big picture, perhaps the scientism in psychiatry is the child of the same scientism in evolutionary biology, which is based on a theory being used religiously as a premise, then being fashioned into a distorted and myopic lens with which to evaluate and/or justify or condemn everything human. This reductionist viewpoint gets wrapped up in pretty bows then offered to us as the pinnacle of pure and indisputable reason based on ’empirical’ evidence’; i.e, fudged numbers and subjective observations masquerading as ‘objectivity’ that can be measured (i.e., story telling they pull out of their butts with carefully crafted values and inventories to tilt the results in their favor).

    Anyone who rejects that lens and/or its popular ubiquity rejects science and reason itself— no matter how idiotic, inadequate, or crooked some of the conclusions are on their face and how utterly barren the approach is— oye— what hubris.

    We need more philosophers, more philosophers of science, and more philosophical scientists.

    And more poetry.

    No one who wants to get better needs Nemeroff.


    James O'Brien, M.D.
    August 29, 2014 | 12:06 PM


    Unfortunately that describes much of academia in general right now. It’s not just psychiatric studies that are bad, but scientific studies of all kinds are at an all time low in terms of ability to be replicated.

    I think there are just too many universities and too many people pretending to be serious when they just don’t have the tools. But careerism reigns.

    August 29, 2014 | 3:08 PM

    I think people in the sciences aren’t getting proper training in critical thinking and, as you say, careerism reigns,

    James O'Brien, M.D.
    August 29, 2014 | 6:03 PM

    1st day of medical school ought to include lectures on the philosophy of science including Popper, Kuhn, et.al.

    Many medical students and scientists think that if if an idea can’t be disproven the theory is strong, they need to understand the opposite is true.

    I’m personally embarrassed by the fact I didn’t discover Meehl well into middle age.

    August 29, 2014 | 9:44 PM

    I wish I had had the education that I got in college in high school, and since I’ve seen the demands of the liberal arts for undergraduates degraded to the point that in my last stint in college, no blue book tests were given. Multiple choice and fill in the blank, or one sentence answers typically only require recall, not thinking.

    Mark Hochhauser, Ph.D.
    August 30, 2014 | 6:27 AM

    From a semi-retired psychologist: The small sample size is very troubling. They should have done a statistical power analysis to determine the appropriate sample size for this study. Plus, patient demographics are crucial if they want to generalize from the sample (aging patients?) to the general population (young, middle-aged, elderly). Were the four groups matched, in terms of age, gender, ethnicity, health status, etc? If not, then their statistical comparisons are biased.

    Steve Lucas
    August 30, 2014 | 10:31 AM


    Your point has been noticed by others in medicine and they are questioning the “evidence” provided them by pharma and others.


    Math aside you look at sample size and sample participants and logic dictates you question the results. I was educated BC (before computers) and we were taught that before you go to the trouble of doing the math, did the whole thing make sense.

    Today with computer driven stat paks a marketing executive can play with the sample size, duration of the trial, or any other parameter and get the numbers he/she wants. The whole concept of science as above market manipulation goes out the window.

    Wiley has noted how education now seems driven by cut and paste all supplied by Google. The internet does not lie, ask any 20 year old.

    Steve Lucas

    Mark Hochhauser, Ph.D.
    August 30, 2014 | 10:59 AM

    Steve, like you I was educated before computers (PhD in 1972) and learned the difference between research methods and statistics.With large enough samples, almost any difference will be statistically significant. The emphasis on statistical significance can mask fatally flawed research designs–leading to more and more junk science.

    August 30, 2014 | 4:22 PM

    I was quite surprised when I was told that the V.A. would pay for acupuncture to relieve nerve pain. Had they decided that the placebo effect was good enough— if it worked, it worked? Then recently I read about RCTs being done to legitimize alternative medicines. With large studies (or enough large studies) they were able to find statistical significance in their treatments. Then Cochrane came along, went through the studies and determined that there was a significance, so now, RCTs— which are widely believed to be the holy pinnacle of “science”— are being used to support the use of useless, and sometimes dangerous (like chelation) therapies that are being recommended by committees so that they are included in some guidelines.

    It’s sad to see medical science mocking itself like this.

    James O'Brien, M.D.
    September 2, 2014 | 12:59 PM

    There have been plenty of negative studies on this idea for decades but the grant money keeps pouring in:


Sorry, the comment form is closed at this time.