This blog follows a slightly different format to previous
entries. Rather than review an interesting psychology study I offer something a little
different, but still strongly themed by my day job of doing psychology
research…

Chapter 6 of AA Milne’s “The House at Pooh Corner”
introduces the now famous game of Poohsticks. The story goes that Winnie the
Pooh invents the game after accidentally dropping a pine cone off a bridge into
a flowing river. Having chanced upon the observation that a cone dropped over
one side of the bridge will be carried by the current of the water passing
beneath to the other side of the bridge, Pooh’s first thought is whether this
might be what scientists would call a replicable phenomenon, or in everyday
speak, something that can be repeated. He says: “That’s funny… I dropped it on
the other side… and it came out on this side! I wonder if it would do it
again?”. For a character based on a soft toy, Pooh bear has a remarkably
scientific outlook on life!

He tries this several times and then by and by develops the
game into a race by dropping two cones at once and trying to guess which will
emerge on the other side of the bridge first as the race winner. When it was time to leave to go home for tea Pooh had won,
by guessing the first cone correctly, 36 times, but lost 28 times. Milne
suggests that this means that Pooh was – well, actually, the narrative stops
short of making any kind of judgement as to Pooh’s predictive abilities. Milne
explains it by saying that Pooh was: “well, you take twenty-eight from
thirty-six, and

*that’s*what he was”.
I was reading this
story to my daughter at bedtime, and upon seeing this set of scores, the
psychologist-statistician in me rather came to the fore. There is a statistical
test of whether that distribution of scores – 36 correct and 28 incorrect – is
likely to be due to chance, or not. If performance were at chance level, then
this would suggest that Pooh was really only guessing which cone emerged
from under the bridge first. The
alternative possibility is that this number of correct predictions would be
unlikely to be due to chance. In that case one could argue that Winnie the Pooh
was applying some logic or skill of judgement in order to make consistently
correct, above chance level predictions on the outcomes of Poohstick races. So,
the first thing I did this morning was run the test and see!

The statistical test in question is called the “Chi-Square
Goodness of Fit Test” (“chi” is pronounced like “sky” without the “s”). You can
look up a technical description of it on Wikipedia, but I’ll try and provide a
more straightforward description here.

It works by comparing real-life scores or data (here it is
Pooh’s tally of Poohstick race results) with perfect 50-50 chance level of
performance. In the case of Poohsticks, for a score of 36 vs. 28 there must
have been 64 races in total (36 + 24 = 64). For 64 Poohstick races, the perfect
50-50 chance level of performance is 32 guessed correctly and 32 guessed
incorrectly. The chi-square test helps us to decide what we should make of a
change of 4 either side of that (32 – 4 = 28 and 32 + 4 = 36). There are two
possible outcomes – probably chance level of performance (consistent with
guessing) or probably non-chance level of performance (consistent with applying
skill and judgement).

The crucial thing that enables a decision to be made about
chance or non-chance performance is that someone, somewhere made very many
observations of what happens over a series of chance level 50-50 calls. Perhaps
they tossed a coin very many times, each time guessing first whether heads or
tails would come up, and keeping a tally of whether they were right or wrong
each time. In doing this they were mapping and defining the chance level of
performance. Knowing what happens by chance helps us to decide whether a new
set of scores resembles chance, or something else. The decision rests on the
size of the difference between the correct and incorrect calls. Differences so
large that they only occur 5% of the time under chance conditions are deemed to
be “statistically significant”. Such differences are usually understood to be
unlikely to be due to chance, and so likely to be due to some kind of
phenomenon, such as, in this example, skill at Poohsticks.

So which was it for Winnie the Pooh’s first ever set of
Poohsticks scores? I ran the analysis using the computer software SPSS

^{©}. The chi-squared goodness of fit test showed that there was no significant effect, chi-square = 1.000, df = 1, p = 0.317. (NB In that last sentence I have reported the chi-squared test statistics in the same way that scientists would do in a research paper; “df” stands for degrees of freedom, and “p” stands for “probability”.) No effect means that Pooh was operating at chance level of performance. This tells us that anyone could obtain a score of 36 correct predictions and 28 incorrect predictions just by guessing the outcome of a series of Poohsticks races without using any skill or judgement.
Based on his original Poohsticks predictions you could argue
that there is no evidence of Winnie the Pooh being anything other than a Bear
of Very Little Brain. But let’s cut him some slack – he did invent the stillpopular pursuit of dropping cones and twigs into water on one side of a bridge
before dashing across to watch them emerge on the other side. Poohsticks is a
wonderful pastime in itself regardless of whether one can guess in what order
they will appear. But just for the competitively minded out there – you would
need to be correct in at least 40 out of 64 Poohsticks races in order for the
chi-square test to return a significant result. Better get practising!

Post script

I posted the above on 2 July 2012. Today, 6th June 2013 a contributor, Eric, points out that I was not the first person to whom the idea of performing a chi-square on Winnie the Pooh's performance at Pooh sticks occurred! Click here for an article in the journal "Teaching Statistics" by Eric D. Nordmoe which preceeds my effort by 8 years! Apologies, Eric.

Post script

I posted the above on 2 July 2012. Today, 6th June 2013 a contributor, Eric, points out that I was not the first person to whom the idea of performing a chi-square on Winnie the Pooh's performance at Pooh sticks occurred! Click here for an article in the journal "Teaching Statistics" by Eric D. Nordmoe which preceeds my effort by 8 years! Apologies, Eric.

Please credit the earlier paper.

ReplyDeletehttp://onlinelibrary.wiley.com/doi/10.1111/j.1467-9639.2004.00163.x/abstract

Thanks!

Eric

Eric - my apologies, you had the idea before me! I'll edit the post forthwith!

ReplyDelete