Making Heads or Tails of Richard Wiseman on CSI

Richard Wiseman is a great bloke; charming, intelligent, and always fun to read or see. I was excited today when on Facebook Dr. Ciaran O’ Keefe mentioned an article by him on CSI (formerly CSICOP)’s webpages at from the latest issue of the Skeptical Enquirer. You might want to read the article – it is interesting, and my thoughts, a slight critique of said article,  follow.

I find myself in disagreement with much of the article, and have taken a little time here to explain why. I welcome as always comments.

Wiseman wrote —

Parapsychologists frequently create and test new experimental procedures in an 
attempt to produce laboratory evidence for psi.
Most of these studies do not yield significant results.
However, rather than being seen as evidence against the
 existence of psychic ability, such null findings are usually
attributed to the experiment being carried out under conditions 
that are not psi-conducive.

Let us start by examining this…

Evidence & Absence

Let us assume we are attempting to discover if a river is polluted. We devise experiments to test for a pollutant, say Tabasco Sauce. These results are negative: there is no Tabasco Sauce in the river. We test for twenty other substances, with negative results. Is the river polluted?  Who knows?!!

Science works (on the Popperian model) by falsifying statements. We can say that the river is NOT polluted by 21 substances. We can not say the river is not polluted (by these tests).

Now imagine we are looking for the little known Pink Fraggle moth. The moth is said to live in the industrial parks of Derby, so we send out search teams, who find no evidence of the moth. We set up moth traps. Nothing. Does that disprove the moth? Of course not. It reduced the strength of the argument for its existence, but doe snot falsify it.

OK, silly examples. Let’s move on to something more appropriate. Let us say we are looking for evidence of the swine flu strain in humans. We test thirty patients suffering flu symptoms – none appear to have anything other than common known flu virii.  SO what do we conclude? No swine flu? Of course not.

Hopefully you the reader can now see the problem with this argument. I’m sure we all realise  that absence of evidence is not evidence of absence: I have no evidence that Becky has bought me a Valentine’s Day present, but that does not mean she has not. We can’t judge that. Insufficient data!

The problem to my mind is that the psi hypothesis is pretty loosely defined. So let’s say we are looking for evidence of telepathy or clairvoyance (either will do), and run Ganzfeld tests. (see for the Ganzfeld Experiment if you are not familiar with it)

Ganzfeld Experiment

Subject in Ganzfeld Experiment: from wikimedia commons

Imagine Becky runs a series of 100 trials in Derby, I run 100 in Cheltenham, and Ed runs 100 in Buxton. My experimental subjects score an average of 27.5% hits, where chance is 25%.  Becky’s score 22.5%. And Ed’s score 75%. So what do we make of this?

We remove Ed’s results as an outlier. We assume, rightly or wrongly that something went wrong in his experimental set up (this actually happens in many forms of meta-analysis).  Becky and my results average out at exactly 25% – chance. Does this disprove the hypothesis of ESP in this case?

Of course not. It certainly would not support it, but imagine for a moment that Becky and i checked each others procedures, and then repeated the experiment six more times, with the same results (or very close each time). A small positive score for my subjects, a small negative score for Becky’s.  If the situations were sometimes reversed, and my subjects scored 22.5%, and hers 27%, well no big deal. But if we see a consistent effect in one direction or the other linked to the experimenter – we have something very odd indeed, and one of the few things which has been put forward as suggestive of a real psi effect –  the (psi) Experimenter Effect* .   If Sargent’ subjects consistently do well, and Blackmore’s score at chance, there could be something wrong with Sargent’s randomization process, it could be ‘Sceptical Fairytales from Bristol‘ , or  it could just be that the  experimenter is an uncontrolled variable in the experiment?

A quick footnote

A (psi) Experimenter effect is one where the result seems to be down to “paranormal” influences. Mundane Experimenter effects in psychology are a well known factor, which double blind tests and anonymous scoring by others etc are designed to prevent. For example imagine i was teaching a class of Ed, Beast and Becky on medieval church history. I set a timed essay. I know Becky knows nothing about the subject, Beast is an authority, and Ed a walking encyclopaedia – wikipedia on legs. I therefore expect to see better work work from them than Becky, and may over mark theirs, and penalize Becky. This is one very basic example – and why many university departments these days have anonymous marking – the marker does not know who the essay belongs to. And if you fancy your students and are influenced by that, then marking Becky higher may be a real issue, as i just don’t have a thing about Ed or Beast! There are many more Experimenter Effects and Observer-Expectancy Effects. See for more on these issues.  However if you really want to understand mundane experimenter effects this useful documentary (contains sound and youtube footage) is DEFINITELY excellent! (Yes I know it’s Ghostbusters!)

So back to the variables —

So the experimenter is an uncontrolled variable? You what? OK, some basic science is needed here. In a classic experimental design, we want everything to be under control – well most things! Let’s assume we want to see what temperature pure water boils at. We have been told it is 100 degrees Celsius. So we set up a very simple experiment – we boil thirty water samples, and measure the temperature. We have several variables -the water, the way we boil it, the place boil it, etc, etc. The water is pure in each sample, not taken from a tap or a puddle, as contaminants might lower water temperature. Add a cup of salt and it will boil at a lower temperature.  So the purity of the water is a  controlled variable . We do each experiment in the same place – controlling that variable.  We do not however take in to account what shoes we are wearing – whether the experimenter is barefoot, wearing sneakers or slippers is probably pretty irrelevant – so we do not control that variable – it is an uncontrolled variable, but to be honest it does not make a difference.. If our experiments all show that water boils at 71 degrees celsius we realise that we need to control the variable place of experiment much more  – we were experimenting on top of mount Everest – so instead we move the experiments to sea level, and 100 degrees is note the result. Atmospheric pressure or altitude is clearly  a variable in this experiment.


more efficient on top of Everest?

The problem is many variables are simply unknown variables: we don’t know all the factors which impinge on the result. So we look for them, and conduct more and more experiments, controlling or assigning uncontrolled variables each time. No single experiment can control more than a few potential variables.

OK, a silly example. However now we can use variables properly. The controlled variable (purity of water) remains the same in our next run of tests. However with a hot air balloon we now boil water at thirty different altitudes as the balloon drifts randomly up and down, and test the results. The height is again a variable: our data reflects the impact of the  altitude on the temperature at which it boils.

Now let’s say we decide to ESP test all the girls I know. The gender is a controlled variable – I am only testing women. Their eye colour however is uncontrolled – it’s purely random, as I do not think it has any bearing on the experiment. What if it does though? What is green eyed girls carry the psi gene and score ten times higher than brown eyed girls? My results will not demonstrate the relationship, as I did not look for it. In fact, what results I get will be down entirely to the random composition of my sample… as i know a lot of Northern European women, I might score higher in my experiments than my colleague in Nairobi might. We might falsely decide Northern latitudes are psi conducive. (Incidentally the example is facetious – eye colour has nothing to do with psychic ability, as far as i know!)

Now this is a problem that faces every experiment: we do not know what variables are actually important. SO we design an experimental hypothesis, and test that hypothesis, nothing more, nothing less. Let us take the hypothesis that CJ is more attractive to women wearing his green pullover. Now women’s reactions to my pullover will vary – so I set out to a club, and spend half the night wearing a black polo neck, and don’t get kissed once. I then try again, wearing my green pullover for the second half of the night – and get snogged ten times. What does this tell us? We might tell out friends my green pullover is “lucky” and attracts women. In fact we have failed to control a key variable – alcohol intake on the part of our test subjects. Only the utterly paralytic would pull CJ in that pullover – but as 2am approaches inhibitions are lowered by alcohol consumption.

So CJ’s girlfriend invents an improved methodology – she shows a sample of fifty random women lots of photos of men in different clothes, including two of CJ: one in his green pullover, one in his black pullover. Each photo is rated for attractiveness by the women, and the results are studied: the effect of the green pullover is shown to be significant – it makes me less attractive, if such a thing is possible! She has controlled the variables that matter, well maybe, It could be that the positioning of the photo of CJ also matters. If the former slide was Albert Steptoe, then the hunchback of Notre Dame, then CJ, he may rate higher in the green pullover. If the previous slides were of George Clooney, Brad Pitt, and Robert Pattinson CJ may be rated less  attractive, though obviously he si actually far more attractive than those three. 🙂 So she randomly varies the order of the slides as shown – the order of the slides is now an uncontrolled variable – to average out this contextual effect. (Or she just shows two slides, of CJ!)


Me – pretty hideous whatever I wear!


So, Cherry-Picking New Procedures?

So the CSI(COP) article features the headline “cherry picking new procedures”.  Yet in the development of a scientific hypothesis this is all normal practice. We do not know what factors are significant to the outcome. We instead begin with a hypothesis, and then test it, adding or removing controlled and uncontrolled variables to try and ascertain the cause.

Imagine Balder’s car breaks down. CJ Ed and Balders all start looking at it. We know the engine no longer runs, but have no idea why. Balders checks the water land oil, Ed looks to see if we still have petrol, and CJ postulates it is caused by the UFO parked in TESCO car park. None of the above explain the break down, and so we move on, trying more and more things, tinkering, and experimenting. Eventually we discover the cause, and repair it – but we have modified the hypothesis many many times in the process.

I know, I love silly examples – but seriously, imagine instead we have a sudden loss of sticklebacks in a pond. We test for many things – pollutants, big fish with a taste for sticklebacks, herons,homosexuality in the stickleback population reducing breeding – we eventually identify the cause.  Would anyone accuse the experimenters of cherry picking, or ignoring negative results? No of course not.

Another example; a class of children in a school have consistently better grades than a second class. Both classes are randomly selected. We test if the teachers are responsible – but when they swap teaching the classes, both remain the same – the “good” children still score higher. We examine the “random composition” of the class – but they seem roughly equivalent in background, socio-economic group. We look for disruptive children – the level of disruption appears the same in both groups. Then we swap the class rooms over – and the results change immediately, also switching… So we test the classroom lighting, the heating, and do so on. Most of these tests prove null – but that is not special pleading or cherry picking, if we then concentrate on publishing the actual cause when identified – say that one class room faces south, and receives far more natural light then the other, or similar. In fact when written up all the preliminary testing may only get a paragraph; it is mentioned to show that this was the real cause, but it is not deemed of great interest. To accuse the educational psychologist of cherry picking their results because they have tried many different tests would be bizarre!

empty classroom photo from Psychology Today

can science solve the mystery of the “cursed” classroom?

And this is what parapsychologists are doing. Enough experimental evidence exists to suggest that there is some ESP effect – that is not to say it is “paranormal” in any way, ESP might really stand for “error some place”, and be down to flawed experimental design or analysis. Nonetheless, given the unknown nature of the process causing the anomaly, we expect scientists to “frequently create and test new experimental procedures in an attempt to produce laboratory evidence for [[the hypothesis]]. Most of these studies do not yield significant results.” Yes, absolutely normal. I fail to see Wiseman’s point?

However, rather than being seen as evidence against the existence of 
psychic ability, such null findings are usually attributed to the experiment 
being carried out under conditions that are not psi-conducive.

That is surely because they are not evidence against the existence of psychic ability? The fact heating levels are the same in both classrooms is in no way evidence against the claim class A scores better in double blind tests than class B! We must “cherry pick” our variables to try and isolate the cause of the phenomena. When studying any anomaly dozens of test may be needed: the fact that fifty prove negative does not invalidate other tests which do find a causal relationship.

Replication, Replication

Now once we have an actual experiment that seems to show an effect, we might want to test it by replication. And here we finally might have some sense in the critique, for most ESP experiments do not produce consistent results. As I suggested earlier, Ganzfeld studies show perhaps son average a small effect: and critics of parapsychology are keen to point out the lack of consistent replicable findings.  I have no issues with that being a very real problem for ESP proponents – but I will discuss that in a moment. Yet this is NOT Wiseman’s claim here – he talks of

 "Parapsychologists  frequently create and test new experimental procedures
 in an attempt to produce laboratory evidence for psi."

Sneaky parapsychologists!  No, as I have said, this is perfectly normal experimental science.  Failed replication is not what he is getting at — in fact I honestly can’t see his point? He states that “such null findings are usually attributed to the experiment being carried out under conditions that are not psi-conducive”  Yes: because the aim of the experiment is to discover which conditions (if any) are psi-conducive. No one believes any more as far as i know that psi works on demand, for al people, and provides consistent results. If it did it would be established science now. So endless papers are written am,d endless experiments conducted, to see what may or may not underlie the many experiments which do seem to produce ESP at a significant level. No one expects to “prove ESP”; instead we are trying to locate and rule out factors that may or may nit explain why if it does exist it is such a marginal phenomena.  Of course we could go the way of Blackmore and Savva, and say”bish – the positive results are flukes, the evidence is too weak to be worthy of further consideration”. To do so however is imply to make a judgement call – it in no way invalidates the work of those experimenters who continue to refine and develop new experiments on different variables to try and explain the  anomaly.

As I have already suggested in passing, it is even worse. The experimenter themselves MIGHT be a factor in the experiment: we are dealing with a bizarre phenomena perhaps loosely associated with consciousness in some way.   IF  psi exists, and IF  we all possess psi potential,  it could be that a sceptic inadvertently uses their own psi to negate hits. Ludicrous? Yep, it sounds so, but there has been a huge amount of work done on Sheep/Goat effects in parapsychology. (See for example). Philosophically there is no reason to question that it may be so – after all, are belief, expectation and attitude really that unlikely to be important variables in something like ESP? They are certainly variables in many other mental and perceptual phenomena, as we see in endless experiments. Why would they be less likely to be involved in the accuracy of a telepathic experience than say of a witness to a road traffic accident?

Let’s move on…

So far I have spent a while unpacking and questioning just one paragraph of Wiseman’s piece. Is there more substance in the article? Wiseman now talks about the file drawer effect. I’ll summarize briefly the concept – imagine that we are looking at incidents of criminal behaviour in teenagers who play video games. 50 studies are conducted, and 45 show no relationship between teenagers who play “Wargasm 2: the Granny Bashing” and subsequent criminal behaviour. 5 show however that teenagers who play the game turn in to thugs. Now imagine the Journal of Computer Game Violence only publishes 7 reports, of which 5 are the ones which found a positive relationship. The others are just consigned to a file drawer and forgotten, and the reading public are led to believe all evidence is that teenagers are adversely effected by this game.

shot from Bully computer game

SPR members quietly discuss Richard Wiseman’s article with him?

And of course it’s worse, it always is. Imagine that a group of psychology undergrads, say a class of thirty uni students, do the experiment.  Regardless of the results, they get their assignments back and use them to eat pizza off They are never published. Except one study, which showed freakishly high levels of violence inspired by the game is published – because it happens to be well written.  So if we look at an average of all the published papers – well we find that yep, the meta shows a strong positive correlation – but it never existed, its just that no one wants to read a thirty page article on a card guessing experiment that showed bugger all results.  Exciting, unexpected results make for fun science: we don;’t writ up every single experiment. SO we use a selection bias – very few ghost hunting groups write up all their nights of tediously sitting around listening to the rain, and bird spotters don’t mention all the times they failed to see a Lesser Spotted Lovelace Throatwarbler. You can read more on the file drawer effect, well known in science as publication bias, here —

Could the evidence for ESP be just down to this file drawer effect – the positive (by sheer chance) experiments are written up as sexy and interesting, the rest discarded? Could be! In fact, the problem is well known in parapsychology – to test how well known I looked at references to the term “file drawer effect” in the LEXSCIEN on-line parapsychology journal database –

I chose two terms. We might expect “ghost” to be a fairly common term in parapsychology journals – it crops up 590 times in the last ten years. “File drawer effect” turns up 57 times. That’s pretty frequent? Parapsychologists certainly are NOT unaware of the problem. I had a quick look at the journals, and in five minutes picked out —

Scargle, J. D. (2000). Publication bias: The “file-drawer” problem in scientific inference.  Journal of Scientific Exploration, 14, 91-106.

Dalton, K. (1994) A report on informal ganzfeld trials and comparison of receiver/sender sex pairing: avoiding the file drawer. Proceedings of Presented Papers: The Parapsychological Association 37th Annual Convention

Scargle, J. D. (2006)  Comment on: “A Critique of the Parapsychological Random Number Generator Meta-Analyses of Radin
and Nelson”, Journal of Scientific Exploration,  Vol. 20, No. 3, p. 420, 2006

Breedervelde, H,   A minor contribution to solving the file-drawer problem – SPR Volume 63, 1998-1999

little ghost model

I found this cute ghost on the web – easier to illustrate than “file drawer effect”

in fact the problem is so well known it is discussed, and the ways in which file drawer are addressed in parapsychology (and psychology) are discussed in a number of standard psychology text books –

Textbook authors criticized parapsychological research for lack of methodological rigor,
 improper statistical analyses, and inadequate randomization of stimulus materials. 
Four of the 1990s texts and five of the 2002 texts discussed the "file-drawer problem," 
referring to the possibility of publication bias in favor of positive findings and against
 negative findings. In the 1990s and 2002 samples, only Atkinson et al. (1996, 2000),
 Smith, Bem, and Nolen-Hoeksema (2001), and Hayes (1998) countered these claims with
 reference to the role of meta-analytic techniques in evaluating this problem. 
Some texts inferred that parapsychologists might not be aware of this problem.
 For example, Baron and Kalsher (2002, p. 116) stated, "While studies suggesting that
 psi exists may represent a small sample of all research conducted on this topic, 
perhaps only the few experiments yielding positive results find their way into print;
 perhaps the many 'failures' are simply not reported."

from Ferrier, G (2003) The Coverage of Parapsychology in  Introductory Psychology Textbooks: 1990-2002, The Journal of Parapsychology,^.  67, Spring 2003 (pp. 167-179)

So hardly obscure, and of course meta analysis, where this problem rears it ugly head, has long since evolved statistical means that are used to address the issue. Dean Radin has discussed in his latest book in some depth the exact file drawer compensation methods used in his meta analysis – I shall leave it here  as too technical for this blog! An example of a claim made about Radin’s work:

  Meta-analysis of the results of 832 RNG studies by Radin and Nelson in 1988 showed an
 effect upon output with odds against chance of a trillion to one (no fewer than 54,000
 file-drawer studies at chance level would be needed to nullify the significance of these results).
filing cabinets

They look harmless, but plenty of people die when “file drawer effects” include toppling over on them each year. These modern models should be safer!

Now are there 54,000 unpublished parapsychological studies on RNGs? (Random number generators). As so few people are involved in parapsychological research: no. 832 studies may represent 10% published I guess, so we might allow for 8,320 studies that have been conducted and not published, but 54,090? Sorry, no dice!  Now as Scargle has pointed out methods used to compensate for file drawer effect ARE inherently unreliable (certainly with small datasets) – but that problem afflicts all science, and the debate continues amongst statisticians. There is no reason to believe parapsychology suffers from this any more than ANY meta analysis – and indeed parapsychologists seem to me to be generally very well informed on the issue.

Undergrad’s sins revealed

Wiseman writes

 To my knowledge, only one paper has revealed an insight into the potential 
scale of this problem. Watt (2006) summarized all of the psi-related final-year
undergraduate projects that have been supervised by staff at Edinburgh 
University’s Koestler Parapsychology Unit between 1987 and 2007.

Well, that tells us something about the research culture at Edinburgh, but is hardly damning of that institution or parapsychology in general. I might ask, how many normal psychology final year student experiments are not included  in databases or published? Perhaps some former psychology students  reading this would care to comment on how many of their experiments were published?

Let’s go back to Wiseman’s article

Only seven of the thirty-eight studies had made it into the public domain, 
presented as papers at conferences held by the Parapsychological Association. 
All of these papers had predicted overall significant performance on the psi task.
 There was a strong tendency for parapsychologists to make public those studies 
that had obtained positive findings, with just over 70 percent (five out of seven) 
of the studies presented at conferences showing an overall significant result,
 versus just 15 percent (three out of twenty) of those that remained unreported.

Now this is final year undergrad studies, remember! If you think the file drawer effect might be a problem in parapsychology, consider how many final year physicists or pharmacology students papers are published? In the search for methodological rigour parapsychologists tend to include every study they can in their inclusion criteria for their meta-analysis, so sure undergrad work might be included – it is controlled by having separate indices for experimental quality (see Radin’s books for  a good discussion) but lets face it most meta-analysis inclusion criteria in almost any discipline did not include undergrad work? To be honest the cynic in me suggest that if large amounts of undergrad work was included in the parapsychology databases, then we critics of the discipline would attack that —  “heads I win, tails you lose” indeed.

Next we have to consider the nature of undergrad work. I have seen some superb final year student dissertations. It does not surprise me that those which found positive results were more likely to be published than those which did not. However, are these independent research projects? If so, sure that is exactly what we would expect to see.  An undergrad who proposes an experimental argument against General Relativity may well have their work looked at: an undergraduate who find that the existing theory is sound is not likely to be published. So if for example some of these studies were original research, which is entirely possible in parapsychology, say a study of an aggression personality test against performance in say a PK task, well sure, positive results will be written up, even if by undergrads, as they may have stumbled upon a clue to the mystery of the two class rooms to use my earlier analogy. .What we can not do is  what Wiseman appears to do here – generalise from undergrads at Edinburgh to the whole of Parapsychology. He writes —

Multiply these figures by the number of parapsychologists who have conducted and supervised psi research across the world over the last sixty years or so, and the scale of the issue becomes apparent.

No it doesn’t – I would suggest it is analogous to taking tea drinking statistics among the members of the Society for Psychical Research attending a lecture in London and then generalizing that heavy tea consumption is a feature of parapsychologists worldwide. 🙂 Watt’s paper is I’m sure fascinating: RiP is not on LEXSCIEN so I can’t read it –  what it does not do is tell us anything about the culture of parapsychology generally. To star with, in the period 1987 to 2002 where other than Edinburgh Uni was conducting parapsychological research with undergrads? Edinburgh led the way after Robert Morris work there: I can only think of Franklin Pierce in New Hampshire, though there may be other places??? I would suggest Edinburgh is very much exceptional in this respect, though I am happy to be proved wrong — somehow i doubt I will be.

Meta-Analyses and Retrospective Data Selection

Well here it gets personal. Wiseman invokes Milton, J., and R. Wiseman. 1999. Does psi exist? Lack of replication of an anomalous process of information transfer. Psychological Bulletin 125: 387–391 his own paper, and the subsequent rows over alleged replications and inclusion criteria. In fact such arguments are pretty much standard in my experience with ANY meta analysis; what do we include, what do we leave out?  I am more troubled by the fact no other meta analysis is mentioned – and the strong suggestion that this somehow represents duplicitous behaviour on the part of the critics of Milton & Wiseman 1999. I think the reason it has fallen by the wayside is there have been so many other attempts to perform analysis of the Ganzfeld experiments –  Dean Radin and Andrew Endersby’s spring to mind, and  Matthew Smith & Louie Savva and Ian Hume have to my certain knowledge in the last few years have done Ganzfeld studies. One spat does not constitute state of the art in a discipline; and all meta analysis is subject to critique. Were Milton and Wiseman unfairly treated? Quite probably – but that was probably down to errors in the replication – see Persinger and Granqvist’s discussions over a “God Helmet” replication for a classic example of this kind of dispute in another field.  Papers are critiqued, and often the critique is unfair: especially when the inclusion criteria are rather subjective. Alleged replications are often no such thing. I shall simply move on…

I must say however that this Wiseman who is apparently sceptical of the Ganzfeld meta-analysis’ overall significance, and a ferocious sceptic of shortcomings in the methodology is also the same Richard Wiseman who wrote two years ago

“I agree that by the standards of any other area of science that remote viewing is proven, but begs the question: do we need higher standards of evidence when we study the paranormal? I think we do. (…) if I said that a UFO had just landed, you’d probably want a lot more evidence. Because remote viewing is such an outlandish claim that will revolutionize [sic] the world, we need overwhelming evidence before we draw any conclusions. Right now we don’t have that evidence.”
‘ ~ Richard Wiseman
Daily Mail, January 28, 2008,
pages 28-29.

Some mistake surely? The remote viewing referred to is the Ganzfeld it seems – The excellent  SoapySam from the JREF forum emailed Wiseman, and received this clarification he says

hi there
yes, i was just talking about the need for more replications under well controlled conditions.
re the remote viewing evidence, I was not referring to RV per se, but rather ESP in general (esp the ganzfeld studies) and I think it got mangled in the final piece
hope that helps


I am at a loss as to what Richard Wiseman actually believes on the issue: I suspect the Daily Mail mangled his words as he says, but I am not sure if he believe the Ganzfeld really does reflect strong evidence for ESP (by normal scientific standards) or not.

Decline Effects and Jumping Ship

This grows lengthy, so I will rapidly look at some of the other claims made in the article.

The alleged psi effects associated with a certain procedure frequently have a curious habit of fading over the course of repeated experimentation. Skeptics argue that this is due to the parapsychologists identifying and minimizing potential methodological and statistical flaws over time. However, some parapsychologists have come up with creative ways of explaining away this potential threat, arguing that such decline effects are either an inherent property of psi or that psychic ability really does exist but is inversely related to the level of experimental controls employed in a study (see Kennedy 2003 for a review of this approach).

That’s a fair summary actually of the decline effect: the SPR website glossary gives – ” Decline Effect: Decrease of frequency of correct scores when a psi test is repeated.”  Were this actually always found in parapsychology, we might well look with suspicion upon our results. Except – it’s not. The decline effect can confusingly be used in two different ways – to refer to the falling off of experimental scores in a single experiment, presumably as the test subject grows bored with rolling dice or guessing cards or whatever, or in the sense Wiseman refers to here — the fact that Ganzfeld Experimenters today get on average lower results than they did in the 1980’s. Now maybe the moon is no longer in the seventh house, the sunspot activity is interfering, or  some other unknown variable is to blame – or maybe it is exactly what Wiseman suggests, and increase in methodological rigour. (Some recent studies suggest this common sense solution may be incorrect however, but they are based upon subjective measures of experimental quality. I honestly don’t know…).

However this is in fact far from surprising to me:  I have seen it before, and not in parapsychology .

Almost every new treatment in mental health goes through this process; ditto many drug trials. Do drugs become less efficient as the years progress? Sure, in the case of some antibiotics, they do. Maybe the initial trials always massively inflate the efficacy of a treatment, whether it be a drug or a therapy. Or maybe, just maybe, we are seeing something else: I intend to write on the  Flynn Effect sometime soon.

Wiseman goes on to discuss the changing paradigms in psi research over the last few decades; he speaks much sense here. The Cinderella Science as Archie Roy called it does indeed change, and new perspectives appear to offer hope of finally solving the anomaly: when I have time i will look at the changing patterns of critical response from sceptics, and demonstrate how they too go through shifts – the two have a reflexive relationship. I don’t see this as a weakness of parapsychology: far from it, I see it as a strength in the evolving quest for truth. In fact, we can see a very similar progress in psychology: parapsychology reflects the wider shifts, from Behaviourism to Cognitive to Neurological work, and as psychology as a discipline evolves, parapsychology also evolves. Sure some of it is fashion – but really, and i am sure outraged readers will howl at this, the same fashions, the same shifts occur in most of the sciences. I could probably write exactly the same about many other disciplines — but for now, well the fact that Kuhn was right and paradigms shift tells us nothing about the truth or otherwise of the key parapsychological phenomena.

An interesting essay, but I’m still not sure I understand its purpose or implication, though I expect it will make CSI readers happy?

cj x


About Chris Jensen Romer

I am a profoundly dull, tedious and irritable individual. I have no friends apart from two equally ill mannered cats, and a lunatic kitten. I am a ghosthunter by profession, and professional cat herder. I write stuff and do TV things and play games. It's better than being real I find.
This entry was posted in Debunking myths, Paranormal, Science and tagged , , , , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

8 Responses to Making Heads or Tails of Richard Wiseman on CSI

  1. paul cornell says:

    enjoyed this short exec summary looking forward to reading the full piece in due course!

  2. WendyC says:

    Nicely done CJ! I’m also rather bemused at the point of the Wiseman article.
    In my head I’m imagining it as a LOL cats Cartoon strip…

    O hai, Iz Richard,
    I haz majick! (srsly)
    Parapsyckologi can haz problems..
    here, let me show u dem…

    The impression given is that parapsychology meetings in recent years *haven’t* been having vigorous debate about how to improve and standardise protocols, but CSI readers won’t know that… So, Prof Wiseman gets to appear radical whereas parapsychology is in a lose-lose situation in that a) It could be portayed as reactionary if it argues the point b)could look like it was needled into following Prof Wisman’s lead if it goes on doing what it was doing in terms anyway…

    Incidentally, the upcoming Research Excellence Framework ( will asess university research standards and allocate funding using a bibliometric system involving article citation rates… picking a fight in print is one way of getting your citation rates heading skywards as people rush to rebutt you. So even if you’re shown to be still win.

  3. WendyC says:

    oops, missed a bit out… “in terms of improving protocols and standardising programmatic research” I should have said.

  4. Dave Smith says:

    The Watt(2006) study can be found here:

    Just scroll down to the bottom of the page.

  5. ersby says:

    Them’s a lots of words!

    A few points, though. First, the Dalton 1994 paper actually reported on only one informal ganzfeld trial – her own. Which was statistically significant. Since the file drawer is where unsuccesful papers are hidden away, this publication actually illustrates the file drawer effect, not compensates for it.

    Meanwhile, Radin’s RNG meta-analysis must be one of the most picked apart meta-analyses I’ve seen. I’ll try and dig up some citations in a day or two.

    And I’m not sure that the ganzfeld experiments do show a decline effect. I know some people say they do, but they’re not dealing with a full data set.

    The problem is that parapsychology has so many confounding factors that can be brought into play when explaining results as and when necessary.

  6. Chris Jensen Romer says:

    Yes, when I mentioned Dalton I was showing that FDE was well known in the parapsychological community: she claims a VERY high success rate as far as I recall, but I have seen no follow up or other data? I think Dalton’s studies were mentioned as an example of an outlier as well?
    As to Radin, there are severl – the Entangled Minds set being the ones I was thing of: but yes extensively critiqued, though he won a retraction I believe on criticism of his first set? Time fo rbed – will discuss tomorrow…
    cj x

  7. Chris Jensen Romer says:

    Thanks Dave for the link – i’ll go take a read!

    cj x

  8. Stephen says:

    Excellent analysis, the point about hidden variables and falsification is frequently glossed over by critics of this area.

    It’s interesting to note that in order to falsify any null hypothesis, only one valid experimental falsification is required.

    The fact that repeatability may be variable due to unknown and/or uncontrollable hidden variables is neither here not there in terms of establishing the validity of the hypothesis – if we have established that it’s not an experimental artifact due to poor methodology, then the only reasonable conclusion is that it’s a result of something that we are not controlling for.

    The fact that we can’t do this on demand by no means preserves the null hypothesis – as long as we’re still getting results which falsify the hypothesis – then we absolutely have to account for those results.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s