Argument # 17: Experiments that show positive results for psi must be replicable to count as evidence.
Corollary: “I won’t consider successful psi experiments as evidence of psi unless the results are replicated and peer reviewed.”
Besides claiming lack of controls, pseudoskeptics also demand psi experiments to be replicable to count as evidence. While this standard may seem reasonable scientifically, it is usually just another tactic to try to raise the bar, because no matter how many times a successful psi experiment is replicated, they still will demand a never-ending higher rate of replication. (If the 2,549 sessions of the Ganzfeld and Autoganzfeld experiments from 1974 to 1997 by different research laboratories which produced above chance results doesn’t count as replicable, then what would?) This is because these guys are all about arguing and playing hopscotch games. No matter what, they never concede that they are wrong, and will use every slimy tactic they can find to deny what they don't believe in. If cornered, they will change the topic or rant about something irrelevant. That's just the way they are.
The first problem with this argument is that just because something hasn’t been replicated doesn’t mean that it didn’t happen. For example, if an Olympic Track and Field runner breaks a world record, and other athletes don’t repeat it, it doesn’t mean that it never happened. Likewise, if I won a slot machine jackpot or threw a quarter that landed on its edge (against astronomical odds), but I wasn’t able to repeat it, it doesn’t mean that it never happened the first time. Similarly, phenomena such as supernovas, balls of lightning, and comets are outer phenomena not replicable under our control but are acknowledged to exist anyway. Therefore, replicating the appearance of UFO’s or ghosts may not be possible because they are out of our control, but that doesn’t mean they never happen or don’t exist. All it would take is one genuine case of a UFO or ghost to prove that they were real and possible. As an unnamed law says: “If it happens once, then it is possible.”
In fact, the very nature of psychic phenomena makes them not easy to replicate. Dean Radin, Ph.D, Director of the Consciousness Research Laboratory at the University of Nevada, and author of The Conscious Universe: The Scientific Truth of Psychic Phenomena, lists 8 reasons why this is so: (page 40)
“Psi effects do not fall into the class of easily replicated effects. There are eight typical reasons why replication is difficult to achieve: (1) the phenomenon may not be replicable; (2) the written experimental procedures may be incomplete, or the skills needed to perform the replication may not be well understood; (3) the effect under study may change over time or react to the experimental procedure; (4) investigators may inadvertently affect the results of their experiments; (5) experiments sometimes fail for sociological reasons; (6) there are psychological reasons that prevent replications from being easy to conduct; (7) the statistical aspects of replication are much more confusing than more people think; and (78) complications in experimental design affect some replications.”
The second problem with this argument is that successful psi experiments definitely have been replicated by different researchers and laboratories. One famous solid example is the series of telepathy studies known as the Ganzfeld experiments, in which subjects guess target images while sitting with ping pong ball halves over their eyes and listening to relaxing white noise designed to deprive them of sensory stimuli to heighten their intuition and psychic abilities. These have been replicated for decades. Dean Radin, in the same book quoted above describes the replicability of the Ganzfeld experiments: (page 78-79)
“At the annual convention of the Parapsychological Association in 1982, Charles Honorton presented a paper summarizing the results of all known ganzfeld experiments to that date. He concluded that the experiments at that time provided sufficient evidence to demonstrate the existence of psi in the ganzfeld…
At that time, ganzfeld experiments had appeared in thirty-four published reports by ten different researchers. These reports described a total of forty-two separate experiments. Of these, twenty-eight reported the actual hit rates that were obtained. The other studies simply declared the experiments successful or unsuccessful. Since this information is insufficient for conducting a numerically oriented meta-analysis, Hyman and Honorton concentrated their analyses on the twenty-either studies that had reported actual hit rates. Of those twenty-eight, twenty-three had resulted in hit rates greater than chance expectation. This was an instant indicator that some degree of replication had been achieved, but when the actual hit rates of all twenty-eight studies were combined, the results were even more astounding than Hyman and Honorton had expected: odds against chance of ten billion to one. Clearly, the overall results were not just a fluke, and both researchers immediately agreed that something interesting was going on. But was it telepathy?”
Radin further elaborates on how researcher Charles Honorton tested whether independent replications had actually been achieved: (page 79)
“To address the concern about whether independent replications had been achieved, Honorton calculated the experimental outcomes for each laboratory separately. Significantly positive outcomes were reported by six of the ten labs, and the combined score across the ten laboratories still resulted in odds against chance of about a billion to one. This showed that no one lab was responsible for the positive results; they appeared across-the-board, even from labs reporting only a few experiments. To examine further the possibility that the two most prolific labs were responsible for the strong odds against chance, Honorton recalculated the results after excluding the studies that he and Sargent had reported. The resulting odds against chance were still ten thousand to one. Thus, the effect did not depend on just one or two labs; it had been successfully replicated by eight other laboratories.”
On the same page, he then soundly dismisses the skeptical claim that the file-drawer effect (selective reporting) could skew the meta-analysis results in favor of psi: (page 79-80)
“Another factor that might account for the overall success of the ganzfeld studies was the editorial policy of professional journals, which tends to favor the publication of successful rather than unsuccessful studies. This is the “file-drawer” effect mentioned earlier. Parapsychologists were among the first to become sensitive to this problem, which affects all experimental domains. In 1975 the Parapsychological Association’s officers adopted a policy opposing the selective reporting of positive outcomes. As a result, both positive and negative findings have been reported atg the Paraspsychological Association’s annual meetings and in its affiliated publications for over two decades.
Furthermore, a 1980 survey of parapsychologists by the skeptical British psychologist Susan Blackmore had confirmed that the file-drawer problem was not a serious issue for the ganzfeld meta-analysis. Blackmore uncovered nineteen complete but unpublished ganzfeld studies. Of those nineteen, seven were independently successful with odds against chance of twenty to one or greater. Thus while some ganzfeld studies had not been published, Hyman and Honorton agreed that selective reporting was not an important issue in this database.
Still, because it is impossible to know how many other studies might have been in file drawers, it is common in meta-analyses to calculate how many unreported studies would be required to nullify the observed effects among the known studies. For the twenty-eight direct-hit ganzfeld studies, this figure was 423 file-drawer experiments, a ratio of unreported-to-reported studies of approximately fifteen to one. Given the time and resources it takes to conduct a single ganzfeld session, let alone 423 hypotheitcal unrepoted experiments, it is not surprising that Hyman agreed with Honorton that the file-drawer issue could not plausibly account for the overall results of the psi ganzfeld database. There were simply not enough experimenters around to have conducted those 423 studies.
Thus far, the proponent and the skeptic had agreed that the results could not be attributed to chance or to selective reporting practices.”
Another skeptical argument against the ganzfeld studies is sensory leakage. Radin addresses this as well: (page 81-82)
“Because the ganzfeld procedure uses a sensory-isolation environment, the possibility of sensory leakage during the telepathic “sending” portion of the session is already significantly diminished. After the sending period, however, when the receiver is attempting to match his or her experience to the correct target, if the experimenter interacting wit the receiver knows the identity of the target, he or she could inadvertently bias the receiver’s ratings. One study in the ganzfeld database contained this potentially fatal flaw, but rather than showing a wildly successful result, that study’s participants actually performed slightly below chance expectation…
Despite variations in study quality due to these and other factors, Hyman and Honorton both concluded that there was no systematic relationship between the security methods used to guard against sensory leakage and the study outcomes. Honorton proved his point by recalculating the overall results only for studies that had used duplicate target sets. He found that the results were still quite strong, with odds against chance of about 100,000 to 1.”
Where skeptic Ray Hyman disagreed with Charles Honorton was in the role of randomization flaws affecting the ganzfeld results. However, as Radin points out, the consensus of the experts on meta-analysis is against Hyman’s hypothesis: (page 82-83)
“A similar concern arises for the method of randomizing the sequence in which the experimenter presents the target and the three decoys to the receiver during the judging process. If, for example, the target is always presented second in the sequence of four, then again, a subject may tell a friend, and the friend, armed with knowledge about which of the four targets Is the real one, could successfully select the real target without the use of psi.
Although these scenarios are implausible, skeptics have always insisted on nailing down even the most unlikely hypothetical flaws. And it was on this issue, the importance of randomization flaws, that Hyman and Honorton disagreed. Hyman claimed that he saw a significant relationship between randomization flaws and study outcomes, and Honorton did not. The sources of this disagreement can be traced to Honorton’s and Hyman’s differing definitions of “randomization flaws,” to how the two analysts rated these flaws in the individual studies, and to how they statistically treated the quality ratings.
These sorts of complicated disagreements are not unexpected given the diametrically opposed conviction with which Hnorton and Hyman began their analyses. When such discrepancies arise, it is useful to consider the opinions of outside reviewers who have the technical skills to assess the disagreements. In this case, ten psychologists and statisticians supplied commentaries alongside the Honorton-Hyman published debate that appeared in 1986. None of the commentators agreed with Hyman, while two statisticians and two psychologists not previously associated with this debate explicitly agreed with Honorton.
In two separate analyses conducted later, Harvard University behavioral scientists Monica Harris and Robert Rosenthal (the latter a world-renowned expert in methodology and meta-analysis) used Hyman’s own flaw ratings and failed to find any significant relationships between the supposed flaws and te study outcomes. They wrote, “Our analysis of the effects of flaws on study outcome lends no support to the hypothesis that ganzfeld research results are a significant function of the set of flaw variables.
In other words, everyone agreed that the ganzfeld results were not due to chance, nor to selective reporting, nor to sensory leakage. And everyone, except one confirmed skeptic, also agreed that the results were not plausibly due to flaws in randomization procedures. The debate was now poised to take the climactic step from Stage 1, “It’s impossible,” to Stage 2, “Okay, so maybe it’s real.”
Even after the successful replicable series of ganzfeld experiments, further replicability was found in the computer-controlled autoganzfeld experiments, designed to be even more efficient and controlled than the original ganzfeld experiments (although not shown to be significant as mentioned above). This time though, two magicians who specialized in mentalism were brought in to check the protocals for cheating loopholes, as Radin describes: (page 86)
“In addition, two professional magicians who specialized in the simulation of psi effects (called “mentalists” or “psychic entertainers”) examined the autoganzeld system and protocols to see if it was vulnerable to mentalist tricks or conjuring-type deceptions. One of the magicians was Ford Kross, an officer of the Psychic Entertainers Association. Kross provided the following written statement about the autoganzfeld setup:
In my professional capacity as a mentalist, I have reviewed Psychophysical Research Laboratories’ automated ganzfeld system and found it to be provide excellent security against deception by subjects.
The other magician was
Radin summarizes the results of the autoganzfeld experiments as follows: (page 86)
“The bottom line for the eleven series, consisting of a total of 354 sessions, was 122 direct hits, for a 34 percent hit rate. This compares favorably with the 1985 meta-analysis hit rate of 37 percent. Honorton’s autoganzfeld results overall produced odds against chance of forty-five thousand to one.”
Further replications beyond the ganzfeld and autoganzfeld experiments include the following: (page 87-88)
replications were reported by psychologist
Kathy Dalton and her colleagues at the Koestler Chair of
Department of Psychology,
While only the 1985
meta-analysis, the autoganzfeld
study, and the
Finally, at the end of the chapter, Radin concludes what the findings of the ganzfeld experiments and others before it suggest: (page 88)
“Now jointly consider the results of the ganzfeld psi experiments, the dream-telepathy experiments of the 1960s and 1970s, the ESP cards tests from the 1880s to the 1940s, Upton Sinclair’s experiments in 1929, and earlier studies on thought transference. The same effects have been repeated again and again, by new generations of experimenters, using increasingly rigorous methods. From the beginning, each new series of telepathy experiments was met with its share of skeptical attacks. These criticisms reduced mainstream scientific interest in the reported effects, but ironically they also refined the methods used in future experiments to the point that today’s ganzfeld experiments stump the experts.”
Thus from all this, it is indisputable that we have solid scientific and statistical evidence that one of the most successful and controlled series of telepathy experiments in history, the Ganzfeld experiments, were definitely replicable. Therefore, the skeptical challenge has been met, and it’s up to them to accept the obvious data or reject it. Radin’s book describes many other replicable psi experiments as well, including ESP, clairvoyance, remote viewing, and psychokinesis. So I highly recommend it. For more details about the Ganzfeld and Autoganzfeld experiments, see the following detailed articles which can be viewed online: