Friday, April 3, 2009

The meaning of Rasch

A problem sometimes with high theory is that even if the original theorist never lost track of reality in his own mind, the readers and followers sometimes do. I shall look at what Rasch was doing in terms of beads in a box, because I find it helpful.

I don't have his book in front of me, but from memory, he began by suggesting that the probability of a child j answering item i in a test correctly might be a function of the ability of the child j and the difficulty item i. He then launched straight into his mathematical method for estimating underlying probabilities from observations. That's all very well if you really understand what's going on, and I'm sure he did. But for someone (such as myself) encountering the argument for the first time, it's easy to get bogged in the mathematics and lose track of what it all means.

I like to think of the probability of a child giving a correct answer to an item with "neutral" difficulty in terms of the proportion of red beans in a box containing red and blue beans. I like to think of the probability of an item being answered correctly by a child with "neutral" ability in terms of the proportion of red beans in another box containing red and blue beans. If a sample of j children sit a test comprising i items, one might think of one set of j boxes, each containing a different proportion of red beans, and another set of i boxes, each containing a different proportion of red beans.

The complexity of the estimation process should by now be obvious. Estimating the proportion of red beans in a single box by pulling beans one at a time from the box, recording the colour, and returning them, would be a laborious and time consuming process. Now imagine each of the Set j boxes being combined in turn with each of the Set i boxes, and a single bean being pulled from each combination, and then using that data to estimate the proportions of red beans in each of the individual boxes.

You don't have to Einstein to realise the process is fraught with difficulty, and unless you are working with very large samples, the results will be somewhat haphazard.

An important claim of Rasch protagonists is that test results are independent of peers sitting the test at the time, and independent of the items set. To illustrate this claim, imagine a set of i test items being selected from an item bank of I items, and a set of j students out of a population of J students sitting the test. If a fixed pass mark is set, students are disadvantaged if they happen to encounter a harder than average set of items. And if a fixed proportion of students are allowed to pass, those who sit the test with a more able than average batch of students will also be disadvantaged. The essence of Rasch is that it iteratively takes item difficulty into account when estimating student ability, and takes student ability into account when estimating item difficulty.

The starting point is to assume each child has neutral ability. In terms of the beans analogy, "neutral" would mean that the child box contained an equal number of red and blue beans, so the probability of pulling out a red bean would be 50%. A child answering a test item is assumed to be like an unbiased person pulling a bean out of the item box. Several children answering the same test item is assumed to be like several unbiased people each pulling a single bean out of the item box, recording the colour, and returning the bean to the box. At the end of the process, the proportion of red beans selected gives an initial indication of the proportion of red beans in the box, or the easiness of the item.

Similarly the initial estimate of student ability assumes each test item has neutral difficulty. In terms of the beans analogy, "neutral" would mean here that the item box contained an equal number of red and blue beans, so the probability of pulling out a red bean would be 50%. A test item being offered to a child assumed to be like an unbiased person pulling a bean out of the item box. Several items being offered to the same child is assumed to be like several unbiased people each pulling a single bean out of the item box, recording the colour, and returning the bean to the box. At the end of the process, the proportion of red beans selected gives an initial indication of the proportion of red beans in the box, or the ability of the child.

The second pass of the iteration acknowledges that the children exhibit a range of abilities and the items are imbued with a range of difficulties, and this in theory gives a better estimate of both item difficulties and student abilities. The third pass uses the second estimate as a starting point and so on. The process continues until each successive pass makes a very small difference to the estimates.

To illustrate with the bean analogy, I shall take the analogy further from reality by allowing multiple sampling from a single child item combination. Suppose a child box has been combined with an item box, and repetitive sampling produces red and blue beans in equal proportions. If the item is assumed to have neutral difficulty, so the red beans represent 50% of the total, it might be deduced from this that the child has neutral ability, and that the child box also contains 50% red beans. But if the item box is known to have 75% blue beans, and the combined box sampling indicated a 50-50 combination ratio, one might deduce that the child box contains a higher proportion of red beans (75% if both boxes contain the same number of beans.

An approach based on this type of reasoning is sometimes used in instruments which claim to be "Rasch based". The Key Math Test (KMT) is one. Here a large number of students have sat the test and a full iterative Rasch analysis has been used to assign difficulty levels to the test items. The items are then arranged in order of difficulty. When a practitioner uses the test, if a student answers a single item or two items incorrectly, the child is offered a chance at the next item. If the student answers that item correctly, the first error is ignored, and the assessment continues. But if the child answers 3 consecutive items incorrectly, the ability of the child is deemed to correspond with the difficulty of the last correctly answered item.

Let's consider this in terms of the beans analogy. First we have to assume the proportions of red and blue beans in the item boxes have been accurately calculated, and the boxes have been organised in order of increasing difficulty. The boxes representing easy items have mainly red beans (because red represents a correct answer), and the proportion of blue beans increases as the items get harder. Let us imagine there are 19 boxes containing 100 beans, and that the number of blue beans increases in increments of 5 beans. The box representing the easiest item has 5 blue beans and 95 red beans; that representing the most difficult item has 95 blue beans and 5 red ones. Now imagine a child who gets the 3rd item wrong, the 4th and 5th items right, and the 6th, 7th and 8th item wrong. This child is deemed to have an ability corresponding with the difficulty of the 5th item. The 5th item box contains 30 blue beans and 70 red ones, so the child box is deemed to contain 30 red beans and 70 blue ones.

Is this reasonable? In my opinion, it is a bit bold. If we were working with real boxes and real beans we could afford the luxury of multiple sampling. We could pull a single bean many times from each child-item combination, and when the samples showed close to 50% red beans we could impute the ratio of red beans in the child box from the known ratio in the item box. But in the KMT, the child having answered the 5th item correctly, and the 6th 7th and 8th item incorrectly, is deemed to have had an exactly 50-50 chance of answering the 5th item correctly. This is a rather bold leap from a one bean sample. Put simply, and without attempting to put confidence levels on a precise range of error, the child might have been lucky on the 4th item or unlucky on the 6th, or even the 7th item. The measurement is imprecise.

No comments: