I Have Seen The Future Of Science And It’s Spin All The Way: Some Thoughts On Daryl Bem’s Habituative Precognition Experiments, and Science by Media
Firstly, a warning. This is not an article on Daryl Bem’s experiments, and if they suggest precognition exists or not. I have no idea. This is an article on hype, spin, the media and lousy science reporting, and how some people base their conclusions on what they see on Twitter more than any consideration of the evidence. It’s long, but it has full links, and at the end there is a twist worth waiting for I promise, one you won’t see coming.
OK, so let’s start with what to some may be a shocking admission. I don’t really believe in psi, that is the parapsychological notion of a sort of ESP effect that we may all have. That is not to say that I don’t believe in some “paranormal” claims – I just find the whole notion of psi philosophically difficult, and am not convinced the psi hypothesis actually works: in fact I have at times stated that invisible goblins may be just as useful a hypothesis. If I was to write about my issues with psi, that would mean no one would ever read any further, so I’ll leave that for now — but I thought best to get that out of the way.
Secondly, and to be frank, since I was a young child psi research has bored me to tears. I am primarily interested in apparitions and poltergeists, and hanging around at night in dark spooky places with pretty young girls. OK, that’s a joke, but I am a ghost/poltergeist researcher, and if you talk to me about Ganzfeld or PEAR random number generator experiments for more than a few minutes my eyes glaze over and I try to shift the topic to Eastenders, which I last watched when Dirty Den threw Angie out in the snow.
Flippancy aside, I still find myself drawn in to discussions from time to time on ESP and psi. OBE, NDE and Remote Viewing I can (and largely do) completely ignore, but research in to ESP and Mediumship has a pretty direct bearing on my specialist area, and PK too, so I suppose I read as much as I find time for, in the peer reviewed literature. (This certainly makes me a freak as far as “ghosthunters” go: despite twenty years of personal effort, the gap between the ghosthunting community and the academic parapsychologists seems to me to grow wider each year.)
Now I appreciate most people don’t have time for this stuff, so I’ll summarize a rather complex story, and in doing so mangle it. I’ll link to relevant pages for those interested in reading deeper though, so if you want to you can be fully informed. I am in now position to judge the science behind these experiments — well I am, but life is short and I have not eaten yet, and I don’t know that I can do them justice, so instead here is the VERY QUICK version…
A famous American psychologist Daryl Bem, conducted a series of clever experiments based on a well known psychological effect (called “habituation”) to see if people could predict the immediate future. Some involved looking at erotic images, but rather boringly the one everyone is interested by was one of those experiments which seemed to show that if you had to recall a list of words, you recalled more of specific words if you actually were then subsequently shown the same words, than you did of the ones you don’t recall.
To be fair my immediate thought was “well maybe some words are just more memorable than others”; but the randomization should take care of that. Statistically there should be no difference between the two sets of words, those shown after the experiment, and those not shown. After I had dismissed this thought, my second thought was “this is chronically dull, I wonder if any exciting cat videos are on YouTube”?
Of course it is not dull: if Bem is right, it challenges almost everything we take for granted about causality, the nature of consciousness, and how the world works. Precognition, sensing the future, even by a few seconds, would have profound implications. And what is more, Bem published his research not in one of the peer reviewed parapsychological journals, but in a (peer reviewed) mainstream psychology journal, for maximum effect one assumes. Now I think if you really want to get a grip on all this, you will want to read the paper itself — and let’s face it, I don’t think many of those who have weighed in on Twitter today have, so here it is for you.
And the news media lapped it up, and it was hailed as a breakthrough piece of psi research, with everyone from New Scientist to pretty much every major newspaper running a piece on it. Parapsychology hit the news, and hit it hard. Yet I fear this just tells us something about the way science is reported, because I subscribe to LEXSCIEN, a journal database that specializes in psychical research.
Now as it happens I had previously reviewed for the JSPR (a peer reviewed parapsychological journal) a book of papers (ANOMALOUS EXPERIENCES: ESSAYS FROM PARAPSYCHOLOGICAL AND PSYCHOLOGICAL PERSPECTIVES edited by Matthew D. Smith. MacFarlane & co Inc. Publishers, Jefferson, North Carolina and London, 2010. 220 pp.) presented at a conference on Liverpool Hope which contained some earlier experiments (or possibly the same experiments later updated) by Bem on exactly this effect. This is what I wrote in that review –
The first essay, by Darryl J. Bem, outlines a series of experiments in presentiment, using the hypothesis of Precognitive Habituation. Habituation is the psychological process by which there is a decrease in response to a stimulus after repeated exposure over time; if we are shown a violent or unpleasant image, our response to it lessens with repeated exposure, and similarly with arousing or positive images. Therefore one might assume, if precognition was possible, that given two similar stimuli images, the one that was subsequently displayed (randomly selected) would be rated as less arousing than the other. The ‘habituation’, by which the photograph would be considered less horrific than the other, would take effect before the process of viewing that photograph repeatedly. So for example, given two photographs of murder victims, the subject would rate as less disturbing the photograph that they would subsequently be shown many times, if habituation can occur ‘backwards’ in time. The experiments reported a significant effect on negative images; further research is clearly called for. The experimental set up is relatively simple, and one wonders if the software is available, and if large scale trials might be conducted over the internet?
And of course, all this has come to pass. Still, the cynic in me noted that the experiments were reported completely devoid of context, and the earlier papers on the effect. So sensing this was nothing new, even if it was new to the news media, I searched LEXSCIEN for earlier papers. I have put the abstracts below, but that does not mean I expect you to read them all – the point is that this stuff, whether real or just down to methodological errors has a history, and has been part of an ongoing research programme…
JSPR January 2007.
CAN A SLIDE-SHOW PRESENTIMENT EFFECT BE DISCOVERED IN BRAIN ELECTRICAL ACTIVITY ?
by Thilo Hinterberger, Petra Studer, Marco Jager, Colette Haverty-Stacke and Harald Walach
The presentation of pictures evokes clearly detectable responses in the electroencephalogram (EEG). Here, the question is addressed whether people show an anomalous pre-stimulus response prior to a sudden appearance of pictures. Therefore, twenty participants were exposed at randomised times to affective and non-affective pictures, and to checkerboard stimuli. In a non-parametric statistical analysis the one-second pre-stimulus epochs were compared with arbitrarily chosen non-exposed pre-stimulus epochs. In a second step, the contrasts between the pre-stimulus responses of different conditions were tested for significance. Checkerboard stimulation revealed no effect, whereas the picture stimuli resulted in a significant increase of the EEG activity. For affective pictures as well as for the difference between affective and neutral pictures, significant z-scores greater than z = 2.0 were found. A control condition with a covered monitor did not show such an effect. The delta band power was only decreased before presentation of pictures. The results support the possible existence of an abnormal presentiment effect. As it is not visible in the averaged EEG curves, this effect may not be time-locked to the stimulus and may be different for each participant. The non – significant results for neutral pictures and checkerboard stimuli suggest that emotional affectivity is important for a pre-stimulus effect in the EEG.
JSPR April 2009
Moderating Factors in Precognitive Habituation: The Roles of Situational Vigilance, Emotional Reactivity and Affect Regulation
In this experiment moderating factors of the so-called precognitive habituation effect were studied. The precognitive habituation effect refers to the apparent influence of later shown pictures or words on participants’ choice and preference ratings, which seem to be biased by habituation effects due to repeated display in the future, and so might be interpreted as an instance of precognition. In this study a number of modifications were introduced in the classic precognitive habituation protocol: (a) words and pictures were used as stimulus material, (b) a new individual difference was measured as a potential factor (affect regulation), and (c) subjects were primed into a reactive mindset in order to highlight the affective nature of the choice task. Only low-arousal positive and high-arousal negative stimuli were used. There was no significant main effect, but in accordance with previous results, subjects who scored high on emotional reactivity displayed a significant precognitive habituation effect, but only with high-arousal negative stimuli. Subjects high on affect regulation also showed a significant precognitive habituation effect for negative stimuli. The strongest effect was displayed by subjects who were high both on emotional reactivity and affect regulation.
The Journal of Parapsychology, April 2010
DO SOME OF US HABITUATE TO FUTURE EMOTIONAL EVENTS?
From an evolutionary perspective, it may be advantageous not only to unconsciously react to emotionally threatening stimuli but also to habituate to these if they should prove harmless. A major purpose of the study was to test for the occurrence of this precognitive affective habituation at a subliminal level using emotionally loaded pictures. The design chosen here enabled us to evaluate whether or not participants habituated to emotionally loaded pictures and to see if they reacted selectively to just those target pictures that would later be repeatedly exposed, thus becoming potentially less threatening. It was further hypothesized that both the subliminal and the precognitive effects would relate to individual measures of emotional reactivity and transliminality. Fifty participants took part in the two successive computer steered procedures in order to respectively evaluate these aspects. A significant habituation effect was found for the negatively loaded targets. The overall findings failed to show a significant discrimination between those pictures than would be re-presented and those that were not. However, by selecting out the 34 individuals who showed affective habituation, a post hoc significant effect of precognitive habitation was found. Keywords: precognition, psi, subliminal, affective habituation, emotional reactivity.
OK, so here is point one. The publication of a paper on this in a mainstream journal provoked a flurry of media interest, but a quick check revealed something no one saw fit to comment upon — that the research was based on a lot more than one set of experiments, performed by just Bem. Now Bem had properly referenced with a full bibliography, but in all the hype it went pretty much unreported? So the research appeared divorced of context, and presented as if a something new and unique. Now yes I think Daryl Bem pioneered and created these experiments, and deserves the glory and the Nobel if it turns out to be good science, but the science reporting left a helluva lot to be desired, even in the science magazines.
Now Bem called for replications, and bent over backwards to make that possible, allowing the original software to be downloaded and anyone who wanted to to run a replication. So I did just that, and had a look at the software, and tried to find flaws in how the experiment was conducted. Of course I was way behind, and Richard Wiseman beat me to it, and Bem responded to Wiseman. At this point I pretty much completely lost interest, because as I keep saying psi is not my field. I’m by nature sceptical,and it is the only the fact that so many of my fellow sceptics jump on bandwagons and uncritically accept any criticism of parapsychology, no matter how flawed, that makes me pay attention at all to stuff like this.
I did consider running a trial: I was delighted when Richard Wiseman produced a registry for replications, where you could register your experiment beforehand to prevent filedrawer errors. I could not get enough people in to my basement to do it: I ran the software with myself and a housemate to test it, and we both scored negatively, against Bem’s hypothesis, and i wrote a few criticisms of the experimental design on forums, but I never actually ran an experiment. I knew plenty of others would – one major online replication is linked above.
Besides, I had by this time found two replications, published before all the media hoo-hah, but largely ignored in that. Alexander Batthyany had done a study at Vienna, which you can read here Retrocausal Habituation and Induction of Boredom: A Successful Replication of Bem (2010; Studies 5 and 7) and Galak & Nelson had failed to replicate experiment 8 A Replication of the Procedures from Bem (2010, Study 8) and a Failure to Replicate the Same Results Different experiments from the sequence, different results.
So I sat back and waited to hear more, if I had to, about the latest case for psi. Now so far I have been fairly scathing about the way this was all reported, but some good articles on how we handle stats etc did come out of it , though that was actually really partially redundant following a careful examination of a classical probability and Bayesian approach to experimental stats published in one of the last issues of the European Journal of Parapsychology. Bem himself has now responded to the linked article.
All went quiet, until a few months back when I had the great pleasure of hearing Prof Chris French speak at a Sceptics in the Pub event, and he mentioned that the original journal that had published Bem’s study had refused to carry an article by him Wiseman and Stuart J Ritchie with failed replications. My first thought on hearing of their trials was the population was small – fifty each I believe – but I looked forward to reading the paper. I could not really see what the issue was, as in parapsychology since 1981 there has been what some call the Honorton-Hyman concordat: an agreement to publish any failed experiment, to reduce the risk of file drawer effect (roughly what you get when people only bother to write up successful replications). Now a failure to replicate should be big news – it should be shouted from the rooftops, because after all the whole Popperian notion of Science methodology is based on falsification, for perfectly good philosophical reasons.
So while technically the failure to publish in the original journal was not a huge issue, because several peer-reviewed parapsychological journals would carry the article, there remains a huge issue. Peer Review is the reason we trust our science – but in fact referees do not replicate the papers experiments, they just read through them and concern themselves with any obvious flaws in presentation, analysis or methodology. Bad papers are picked up when other scientists try to repeat the experiments – replication. HOWEVER, and it is a huge HOWEVER, most papers probably never get tested this way. Replication means doing all the work done by the original researchers all over again, and if the results are not that interesting, or controversial, no one cares enough, can afford to, or wants to waste research time in these things, as it would be career suicide. If you do replicate at all, you are usually trying to build on the original findings, which means you tweak the experiment in some ways, adjusting the variables. So very few true replications occur, and journals don’t normally like to carry them, as they are just repeats of earlier work. Now one would have thought a falsification was of interest — but actually many journals appear to have “no replication” policies. As academic and science are publish or die environments, there is a HUGE disincentive to bother at all. I have written about this before, and I was gladdened when the excellent Ben Goldacre dedicated one of his The Guardian columns to the failure of the journal to carry Wiseman, French and Ritchie’s paper. This is not just a failure in this case: this is a huge problem in the way we do science.
Still, French et al had every chance to publish their findings in a parapsychological journal. Now many of the sceptical readers of my blog will think “why on earth would they do that?”. After all, parapsychology is bunk, right? Nope. I have in front of me Anomalistic Psychology, a book by the same Chris French himself, and David Luke, Nicola Holt, and Christine Simmonds-Moore. In the concluding chapter of this 2012 book we read the following
In the chapter on pseudoscience it was identified that parapsychology has all the hallmarks of a science, falling a little short on some of the benchmarks of good science, but performing better than mainstream science on others. (p.193)
Wiseman and French have both published extensively in the parapsychological journals. Nope, something far bigger was at stake here.
If Bem had not published in a mainstream journal this time, with all the ensuing media hoo-hah, this would have remained part of the ongoing parapsychological discourse. Previous articles had reached an audience of hardcore parapsychologists, book reviewers in that vague area like me and a psychic dog. (OK, I made up the psychic dog.)He didn’t. Bem’s article had reached half the world through the media: so to correct the assumption the research was unchallenged, French Wiseman and Richie wanted mainstream exposure, and failed to get it through any well known print journal. They have acted honourably enough, and academically soundly, but the failed replications are not as exciting as “Can We See The Future?” headlines. They wanted to reach the same audience Bem did. I fear by this point the people interested in science had long since lost interest – now we are seeing the beginning of a war of hype and spin, where parapsychological true believers and entrenched sceptics post tweets with details of the latest revelation, and any analysis of the paper flies out the window.
I’m not going to analyze the paper either – it was posted here, and you can read it yourself. They have got the coverage – Daily Mail, and @StuartJRitchie has just posted on Twitter they have made the front cover of Die Spiegel! Again, I’m not accusing the chaps of being shallow publicity whores — what they have done is respond in equal measure to the sensation that was provoked by Bem’s original paper. I doubt however they will welcome my comments and what I have to say, despite the fact I have tremendous admiration for them as hard working intelligent critics and parapsychologists. Anyway you probably want to read it now!
What concerns me is that throughout the UK, hundreds of well meaning sceptics, without the slightest knowledge fo any of the experiments, the papers, or the background will tweet this with an “ESP is Dead!” subtext (or overt sneer) and science culture will become even more hostile to this kind of research in the future. That’s not a rational considered opinion – well it might be, if there was no discussion, and Bem immediately conceded defeat.
Of course no such thing has occurred. Bem went immediately to work defending his research. The key bit is here…
Nevertheless I consider it premature to conclude anything about the replicability of my experiments on the basis of this article. First, in mainstream psychology it usually takes several years before enough attempted replications of a reported effect have accumulated to permit an overall analysis (often called a “meta-analysis”) of the evidence—20 years in the example described below. It usually takes busy researchers several months to find the time to design and run an experiment outside their primary research area, and my article was published only a year ago.
In their article, Ritchie et al. mention that their experiments were “pre-registered.” They are referring to an online registry set up by Wiseman himself, asking anyone planning a replication to pre-register it and then to provide him with the data when the study is completed. As he noted on the registration website: “We will carry out a meta-analysis of all registered studies…that have been completed by 1 December 2011.”
By the deadline, six studies attempting to replicate the Retroactive Recall effect had been completed, including the three failed replications reported by Ritchie et al. and two other replications, both of which successfully reproduced my original findings at statistically significant levels. (One of them was conducted in Italy using Italian words as stimuli.) Even though both successful studies were pre-registered on Wiseman’s registry and their results presumably known to Ritchie et al., they fail to mention them in this article. I consider this an important omission. (I also note that Ritchie et al., describe their replication attempt as three independent studies, but the total number of sessions they ran was the same as the number I ran in my own original experiment and its successful replication.)
Second, it takes several years and many experiments to figure out exactly which variables in an experiment affect the results. Consider, for example, an attempt to assess the replicability of a well-known effect in mainstream psychology known as the “Mere Exposure Effect,” first brought to the attention of psychologists in 1969: Across a wide range of contexts, the more frequently humans or other animals are exposed to a particular stimulus, the more they come to like it. Twenty years later, a meta-analysis of over 200 mere exposure experiments was published, showing a significant overall effect; it is now widely accepted as a “real” and replicable phenomenon. But the same meta-analysis reveals that the effect fails to replicate on simple stimuli if other, more complex stimuli are presented in the same session. It fails to replicate if too many exposures are used, if the exposure duration is too long, if the interval between exposure and the assessment of liking is too short, or if participants are prone to boredom. As a result, the meta-analysis included many failures to replicate the effect; several of them actually produced results in the direction opposite to prediction. In short, it many more than three replication failures to conclude anything about an alleged effect.”
You can read the rest of Bem’s reply here.
Now Bem’s langauge is measured here. You can read the rest of the statement at the link above, but he makes what appears to be a bizarre claim – that Wiseman French and Ritchie have failed to mention at least three studies they were aware and which were registered with Wiseman’s anti-filedrawer Registry. This sounds like heinous wrongdoing, especially as two of the experiments according to Bem actually were successful replications at statistically significant levels. So is Richard Wiseman really the pantomime villain some psi enthusiasts seek to portray him as, suppressing this research?
Nah. French Ritchie and Wiseman reported on their own experiments, nothing more, nothing less. Bem can post links to these successful replications – after all I have already mentioned one, and one failure to replicate (which admittedly slightly altered the methodology as I recall by omitting the relaxation sequence pre-trial, though that may have been yet another one in a recent JSPR, I’d have to check.) Bem begins by conceding that they seem to have made a good faith effort to replicate: but the omission of other studies, if they really were registered and they were aware of them can be seen as part of a a war of spin, or just the fact it is not usual to preempt others work by publishing their findings before they do: sheer academic politeness. Psi-advocates will believe the worst, sceptics who have read this far the best (as I do) but either way as the press go made for the story all sense of proportion, Truth and Science will probably get lost.
UPDATE: A few hours after I posted this Richard Wiseman replied on Twitter
Hi -re blog, we were just reporting our studies. The pre-registry meta-analysis is different and has been submitted for publication.
From reading Stuart Ritchie’s comments it is clear the studies Bem referred to our are not yet published, so they were not referred to out of academic courtesy as i opined above. The meta-analysis will (as I actually said in this pieces conclusion) follow in due course.
Don’t believe me? Have a look at The Daily Grail sites article on Live Sciences reporting of Bem’s statement. This is little short of scandalous, and LiveScience should be ashamed of itself. It’s nothing new though — I recall the nonsense New Scientist site posted on the reissue of a Sheldrake book, which was amusingly torn in to by commentators on their own site who saw it as the misrepresentation it was. I’m not a big fan of Sheldrake, but this was disingenuous at best, and New Scientist (one of the best science magazines) should be embarrassed they hosted it.
So what do I think of Bem’s response? I have written on Experimenter Effects before, and Bem invoked them in his original paper by referencing a famous paper by Wiseman, in fact two. I’ll post the abstract.
Journal of Parapsychology, Vol.63, 1999, p. 236
EXPERIMENTER EFFECTS AND THE REMOTE DETECTION OF STARING: A REPLICATION
RICHARD WISEMAN AND MARILYN SCHLITZ
ABSTRACT: Both authors recently ran experiments to discover whether people can psychically detect when another person is staring at them. R. W. is a skeptic regarding claims of parapsychology and M. S. is a psi proponent. R. W.’s studies obtained chance results while M. S.’s study obtained statistical significance.
The authors then carried out a joint study to help determine why their experiments had obtained different results. M. S. and R. W. acted as separate experimenters for two different sets of trials. These trials were carried out at the same time, in the same location, used the same equipment, drew participants from the same pool, and employed the same procedures. The data from M. S.’s participants were statistically significant while the data from R. W.’s participants were not. This paper describes an attempted replication of this initial joint study.
Participants were hooked up to a computer that recorded their electrodermal activity (EDA). A videocamera was placed in front of the participant and it fed an image of them to a monitor located in a separate room. Each experimental session consisted of thirty-two thirty-second periods. Half of these periods were randomly allocated to a “stare” condition and half to a “non-stare” condition. During the stare condition, the experimenter looked at the monitor and attempted to remotely influence the participant’s EDA. During the non-stare condition, the experimenter looked away from the monitor. The EDA of R. W.’s participants was not significantly different between the two conditions. In contrast, the EDA of M. S.’s participants was significantly lower during the stare than non-stare periods. The paper discusses competing interpretations of these results and possible future research in this area.
Perrott Warrick Research Unit
Now my understanding is that subsequent experiments have knocked the experimenter effect, and I may well write on this at some point soon, but really, Bem’s comments are hardly unfair. If psi exists, and is a by product of the human brain or consciousness in some way, we might expect exactly this kind of thing I guess — the experimenter as a variable.
For now, I’m going to avoid getting involved. I doubt, very much, that the habituative precognition effect is real: but it will take time, more studies, and Wiseman’s forthcoming meta-analysis of all studies to demonstrate it one way or the other. I’m going to end with a sobering thought though: I don’t think we can trust the science press, let alone the popular press, to give you a reasoned understanding of ANY scientific research, not just parapsychological stuff. Secondly, we live in a culture here in the UK where “sceptical” responses are met with huge enthusiasm, just as in the Mid-West of the USA “Jesus did it” type articles get huge applause and re-tweets, because our culture is defined increasingly by gut instinct sceptical denialism which can be every bit as dogmatic – not informed commentators like French, Wiseman and Ritchie, who I have every respect for doing all this work, but those who have abdicated their responsibility to critical thought and informed rational commentary by simply accepting media spin as truth.
But I would say that. I had to bore myself to death reading a whole load of psi articles: you should all suffer with me.
UPDATE 2: Last night Stuart J Ritchie kindly tweeted to draw attention to the authors reply to Bem’s reply which was just published. This has been the first opportunity I have had to update since, but please do read it. It clarifies and answers some of Bem’s criticisms. I was slightly confused as to why (psychology) Experimenter Effect was mentioned at all – it is the (parapsychological) Experimenter Effect that is relevant, though given how wrongly spelt words are handled by the researcher I guess there could be scope for both.
What was of interest was
However, even if the Wiseman-Schlitz experiments were conclusive evidence of experimenter effects, there is still no analogy from them to Bem’s experiments. This is because, in the Wiseman-Schlitz experiments – which investigated ‘the sense of being stared at’ – the experimenter was heavily involved in the procedure, being the ‘starer’ who directed their gaze at participants, who were to report if/when they sensed that gaze. Bem’s experiments, on the other hand, are run and scored entirely by the computer program, with experimenters only greeting and debriefing participants. In addition, the staring experiments measured an emotional/perceptual outcome, which is quite different from the basic word memory test involved in Bem’s experiment.
Finally, it is worth pointing out, as we do in our paper, that only one of us (SJR) personally ran their replication attempt; two of our replication attempts were run by research assistants. Thus, the participants in those samples did not encounter either of the two of us who are, to quote Bem, ‘well known as psi skeptics’ (RW and CCF).
Yep, so Wiseman and French’s personal involvement was limited: we might wonder if their research assistants are similarly sceptical, but from what I know of Wiseman’s PhD students (I only know one former student) they are an open minded bunch. I was rather disappointed in one thing – the Daily Mail misattributed Stuart Ritchie’s words to Wiseman, perhaps because the public know Wiseman, and fail to mention Ritchie at all. I have never met Stuart, but he is clearly an able intelligent and good bloke based on his twitter feed, and deserves full ecognition especially as he was apparently lead researcher on much of this. I did find his blog listing publications here. Given we have similar interest sin religion and psychical research I will watch his work in future with interest.