If we imagine a box containing 64 beans, of some are red and some are blue, the probability of pulling out a red bean is directly proportional to the number of red beans in the box. You could plot this as a graph of probability, stated either as a fraction or as a percentage, against the number of beans, ranging from zero to 64, and the graph would be a straight line, from the origin.

The essence of Rasch is that the probability of a child answering an item correctly is a function of the ability of the child. I am not sure that there is any need to complicate this. It seems fine just as it is. Yet if you read the iterations of Winsteps, it is made complicated. Before looking more closely the iterations of Winsteps, I should like to spend a little more time with my simple model.

Imagine a test comprising 64 items, each represented by a box of 64 beads, being sat by 64 children, who are also each represented by a box of 64 beads. Imagine the difficulty of the items being perfectly graduated, so that the item first box contains all red beads, the second one blue bead, the third two blue beads and so on. And imagine the abilities of the children to be perfectly graduated such that the first child box contains one red bead, the second two and so on to the last box, which contains 64 read beads. And imagine we impose a deterministic rule such that if the probability of a correct answer is 50% or greater, a correct answer is given, and if it is less than 50% an incorrect answer is given. In this case the first child would answer one item correctly, the second two items and so on. And the first item would be answered correctly 64 times, the second 63 times and so on.

I should like to pause to reflect on what we have here. Both the difficulty of the items and the abilities of the children have been revealed (by this somewhat artificial test) to be spread over a range, and the range of both difficulty and ability can be expressed on the same scale, either with or without units. Ability and difficulty could be measured in beans, over a range from zero to 64, or in probability, over a range from zero to 1. The followers of Rasch make both these claims: that ability and difficulty are measured on the same scale, and that the scale is without units, or more importantly, that the measurement is independent of the units chosen.

There is something else important I should like to note. It may seem blindingly obvious here, but it is not made obvious or even explicitly acknowledges in most of the Rasch literature. A child with an ability of 32 red beans, or probability 0.5 or 50%, is a child with median ability. Likewise an item with 32 blues beans has median difficulty, but more importantly, at risk of repetition, a child who answers a 32 blue bean, or 50% probability item, correctly 50% of the time, is a child with median ability. A child who answers a 16 blue bean question 50% of the time is on the boundary of the first quartile, and a child who answers a 48 blue bean question correctly 50% of the time is on the boundary of the upper quartile. So while Rasch measurement claims to be peer independent, and indeed it may independent of the peer subset sitting an already calibrated test, it is not independent of the calibration population. Item difficulty, as defined in Rasch measurement, is in fact entirely dependent on the population of children sitting the test for the original calibration. Items with a difficulty of zero logits, represent the median, and children who answer those questions correctly 50% of the time are on the median line of the calibration population.

All of this was a rather lengthy introduction to the transformations described in the Winsteps documentation. The page begins with the following sentence: "The Rasch model formulates a non-linear relationship between non-linear raw scores and linear measures." It continues with the following two sentences: "So, estimating measures from scores requires a non-linear process. This is performed by means of iteration."

Now as I said above, the essential posit of Rasch is that the rightness or wrongness of an answer given in a test or any psychometric instrument is a random event. So the interpretation of results from a test or instrument has to be carried out with caution.

Two things come to mind here. The first is that when I was at school (studying mathematics and physics) we were taught that the best way to reduce the influence of errors of measurement was to take many readings. If you take one or a very small number of readings, your results will be subject to error, and no amount of fancy mathematics carried out after the event will alter that. The second is that the simplistic model I described at the opening of this blog could be described as a probabilistic model, albeit compromised at one point, and out of it came a **linear** relationship between raw scores and ability. Acknowledged, when you remove the deterministic assumption, there will be oscillation around the line, but the underlying straight line will still be there.

Let's return to Winsteps where they say next: "The fundamental transformation in linearizing raw scores is:

"log ( (observed raw score - minimum possible raw score) / (maximum possible raw score - observed raw score) )"

In the diagram below I have plotted (in blue) the probability of pulling a read bean out of each of 64 boxes labelled 1 to 64, in which box 1 contains 1 red bean, box 2 contains 2 red beans and so on. In the forced example described above, this would equates to the observed raw scores. And as mentioned above, it is a straight line passing through the origin. The formula for the line is:

y = x/64

I have then charted (in yellow) "The fundamental [Winsteps] transformation" before taking the log. The most obvious observation about this "linearizing transformation" is that it has transformed a straight line into a curve. The formula for the curve is:

y = x/(64 - x)

It looks a lot simpler than the quoted verbal version, not least because my minimum score is zero. If you read further into the Winsteps documentation you will see that the software doesn't like zero or maximum scores, and the reasons are pretty obvious from the formula. But if they didn't carry out the transformation, zeros and maximum scores wouldn't be a problem.

Finally I took the log (shown in pink), which completed the transformation from a straight line, not only to a curve, but also to a double bended curve. The formula for this curve is:

y = ln(x/(64 - x))

I think that's enough for one day. In my next blog, I'll use java to generate some random results from my beans in boxes model.

## 1 comment:

Jonathan, thank you for this post about Rasch measurement and Winsteps. Your post raises some crucial theoretical and practical issues. I plan to comment on your post in the June 2009 issue of Rasch Measurement Transactions: www.rasch.org/rmt

(Mike Linacre, author of Winsteps and Editor of RMT)

Post a Comment