Tuesday, July 28, 2009

Poisson Estimation

In Chapter two of his book, Rasch jumps from his Equation 6.1 to an approximation, which he attributes to Poisson, but he not only provides no derivation of the approximation, but also omits to write out the approximation in general terms.

I have found a page on the web, which sets out a cursory but satisfactory explanation of the approximation. Rasch calls the approximation Poisson's Law. The page on the web calls it the Poisson distribution or the Law of Rare Events.

Let's begin with my Equation 3 from my previous blog.

p{a|n} = (n!/((n-a)!a!))θa(1-θ)n-a (3)

Then according to the argument, if you focus on the right hand term, (1-θ)n-a, you can approximate that as:

(1-θ)n-a e-nθ (4)

where ≈ means is approximately equal to, θ is assumed to be low. The derivation involves isolating the left hand of the equation and taking the natural log:

ln(1-θ)n-a = (n-a)ln(1-θ) (5)

It then uses another "approximation", which is not proven or explained:

ln(1-θ) -θ, where θ<<1 (6)

I have no idea why this works, but I checked it for a few values for θ as shown in the array below:

θ 0.050 0.040 0.030 0.020 0.010
1-θ 0.9500.960 0.970 0.980 0.990
ln(1-θ) -0.051 0.041 -0.030 -0.020 -0.010

The derivation then uses another approximation:

(n-a)(-θ) -nθ, where θ << 1 (7)

Let's have a closer look at this.

(n-a)(-θ) -nθ + aθ

So they are saying aθ disappears if θ is very small, which strikes me as a bit bold, but if you plug approximations 6 and 7 into expression 5, you get::

ln(1-θ)n-a -nθ, where θ << 1 (8)

Taking the exponent of both sides takes you back to approximation 4.

Next, if you focus on the term n!/((n-a)!, according to the argument on the web,

n!/((n-a)! na, where a << n (9)

I won't copy out the derivation of this one, but if you substitute 8 and 9 back into a slight rearrangement of expression 3 you get:

a!p{a|n} naθae-nθ

p{a|n} ae-nθ/a! (10)

This can be further simplified if define nθ as λ, the expected frequency of events:

p{a|n} λae-λ/a! (11)

When I first read this chapter, I saw that an approximation was being claimed, and it just increased my head ache. Now that I have looked at a derivation of the approximation, I can emphasise the assumptions being made, and make an observation.

The assumption of approximation 8 is that the probability of an individual event is very low, and the assumption of approximation 9 is that the observed frequency of the events is low. In theory, one implies the other, and when applied to a reading test, where the expected frequency of errors is low, the theory may translate well into practice.

But according to conventional psychometric theory (and I'm sorry I don't have references to hand), best results are achieved in a dichotomous test when the probability of success on individual items covers the mid range. This needs to be born in mind as I proceed through the Rasch argument.

Thursday, July 23, 2009

Building Rasch Formulae

Towards the end of his introduction, Rasch describes a simple probabilistic model involving mice. I shall not use his exact notation because it is not convenient for me to put bars over characters. So where Rasch describes and outcome A or à I shall describe the outcomes as 1 or 0. The probability of outcome 1 is q, and the probability of outcome 0 is 1-q.

The thing about probability is not that it is hard, but that it is tedious. What makes it hard is that people who work with probability all the time, to reduce the tedium, have developed annotations. They also jump steps. I guess for them the fewer the steps, the lower the tedium, for for people (like me) who don't work with it a lot, it has the effect of combining tedium with headache. I am sure this is one of the reasons why Rasch is not well known by people who walk up and down high streets carrying shopping bags.

Rasch jumps with his mouse model straight into an expression which looks to the layman like gibberish. I shall move in steps small enough for me to understand them. So the probability of the outcome of a single event is 1 is given by:

p{1} = q (1)

If there are two events, the probability that both outcomes are 1 will be q2. If there are three events, the probability that three out of three outcomes will be 1 is q3. Conversely, the probability that three out of three outcomes will be zero is (1-q)3. This is pretty easy because I am skirting along the outer edge of the outcome tree.

It gets harder to predict one or two results of 1 because there are multiple paths to get there. Take one result of 1. The result of the first event might be a 1, followed by two zeros. Or the result of the first event might be a zero, followed by either a 1 and a zero, or a zero and a 1. The probability of any one of these paths would be q(1-q)2, but as there are three such paths, they must be added together to give the overall probability of a single 1 out of three events (or trials). Using similar notation to Rasch:

p{1|3} = 3q1(1-q)3-1 (2)

This is a nice little formula, in so far as it goes, but it is not quite Equation 6.1 given by Rasch on page 12 of his book. It is on the number of paths that it falls down, because to some extent a single occurrence of an event is a special case.

Of course I realise that probability experts covered this in kindergarten, but I am not a probability expert. I did pure and applied math at school: ladders leaning on walls and simple harmonic motion. I left probability to the ubergeeks. Sometimes I regret that decision. We had a very good math master. He was very systematic, very methodical, and he made everything very easy.

Returning to the mice, I think I'll convert them to coins, because for me they are easier to manipulate both conceptually and in practice (they stay still for longer). Following in the tradition of my old math master, I am building up in small steps. I have two coins in front of me. I note there are two ways of displaying a head and a tail.

Now I have three coins in front of me. I am counting the arrangements of 2 heads and 1 tail. Essentially the tail can be in any one of three positions, so there are three possibilities, as shown in equation three above. Similarly with four or five coins, if there is only one tail (or one head), the number of combos is the same as the number of coins.

Now I have four coins in front of me and I am counting the arrangements of 2 heads and 2 tails. If I put both the heads on the left, both the tails are on the right and there is no movement; that is one possibility. If I hold one head on the left, the other head is loose in 3 coins, which gives three possibilities, except that you can't allow the head on the left of the three, because that has already been used. If I hold no heads on the left, one tail is loose in three coins which gives three more possibilities. The total is 1+2+3=6.

Now I have five coins in front of me and I am counting the arrangements of 3 heads and 2 tails. If I put all the heads on the left, all the tails are on the right and there is no movement; that is one possibility. If I hold 2 heads on the left, the third head is loose in 3 coins, which gives three possibilities, of which only two count, as explained above. If I hold one head on the left, at first sight, two heads are loose in four coins, which would add six possibilities, but of course that is not really the case, because only one head is on the left, which means the next coin must be a tail, so in fact you have a single tail loose in 3 coins, which has 3 possibilities. And if I hold one tail on the left, the remaining tail is loose in four coins which adds four more possibilities. Altogether that is 1+2+3+4=10.

My final experiment is with six coins and I shall begin with 4 heads and 2 tails. Holding all the heads on the left, all the tails are on the right and there is no movement; that is one possibility. If I hold 3 heads on the left, the fourth coin has to be a tail, which leaves a head and a tail to alternate positions - two more possibilities. If I hold two heads on the left, forcing the third coin to be a tail, one tail has complete freedom of movement through three coins, adding three possibilities. One head on the left followed by a tail leaves one tail complete freedom of movement through four coins, adding four possibilities. Finally one tail on the left leaves the remaining tail free in five coins which adds another five possibilities. That is 1+2+3+4+5=15.

Finally for 3 heads and 3 tails I begin with 3 and 3, which is one combo. Then I'll hold 2 heads and a tail on the left, leaving the remaining head free in 3 coins, adding 3 combos. Holding 1 head and a tail on the left leaves two heads and two tails free, which was covered three paragraphs above and adds 6 possibilities. And holding a tail on the left leaves 3 heads and 2 tails as discussed two paragraphs above and adds 10 possibilities. That is 1+3+6+10=20.

I'd love to say I could see a pattern in all of this, but at this stage I really can't. What I can do is cheat a little and look for a similar formula on the web. The notation seems to have changed a little, but making allowance for that, the probability of a specified event occurring a times out of n trials seems to be:

p{a|n} = (n!/((n-a)!a!))q a(1-q)n-a (3)

I tried it with the last two examples above, and the term with all the factorials in it seems to yield the same number of paths through the outcome tree as my manual method. I'd love to find a step by step derivation of that term*, but I haven't time. In the absence of that, at least I have put some pith on an expression, which on my first reading of the Rasch book, was not very meaningful to me.

*I have subsequently found a good one here.

Wednesday, July 22, 2009

Reading Rasch

My blog will be (more) shambolic and disorganised (than usual) for the next few posts. I am reading a classic text to clarify my understanding of a topic which has been rattling around in my brain for years. I find it helps to make notes when I read a technical text, and the more technical the text, the more detailed the notes have to be.

I mentioned in my previous blog the sleight of hand by which the Rasch parameters (individual ability and item difficulty) can be estimated independently of one another. I have seen the formulae and I have read the text, but it does not sit comfortably with me. You cannot begin the estimation process for item difficulty without individuals, and vice versa.

I think some confusion arises because in much of the subsequent literature the term "logits" is used, whereas, in the original book Rasch emphasises that his parameters are pure numbers - they don't have units. If you try to create units (for whatever reason), in my opinion, you lose objectivity. It is only when you leave the parameters as pure numbers that you preserve objectivity.

So if you apply a set of 3 items to a population of students, the Rasch difficulty parameters (RDP) for those items may turn out to be 1, 2 and 5. If you apply the same items to a different population, I am not sure whether the RDP will be exactly the same, or in the same ratio. My gut feeling is the latter, but I'll firm up that view (or not) as I go through the original argument.

I'm on page 4 at the moment, and reading the claims, rather than the argument which proves them. A problem I have when reading something written fifty years ago, especially written by someone born a hundred years ago, is that the author sometimes finds exciting stuff which, in the context of today, I find less exciting. Exacerbating this, in an original text, which has subsequently been written about a lot, the author, making the case for the first time, emphasises stuff which the later reader takes for granted, or at least has read many times before.

For me, having enjoyed the movie Jurassic Park, what makes Rasch interesting is his use of a probabilistic model, as distinct from traditional deterministic models. Among my friends I describe Rasch Theory as Chaos Theory for Social Scientists. Rasch, on the other hand, in his introduction, seems most excited about the distinction between the "aggregates" approach of traditional statistical analysis and the (individual) parametric approach of his method. He is also fairly (and not unreasonably) excited about the elimination of one or other of his parameters from his equations.

On page 5, Rasch introduces his work with reading tests. He emphasises the importance of having access to results from two tests applied close together in time. He also collected data from two distinct student populations - a so called "normal" population, and a second dataset from students requiring remedial tuition in native language skills.

Still on page 5, Rasch introduces charts shown on the ensuing pages. I remember reading this for the first time and straining my brain for something new and interesting in the charts. But there was not. Contrary to the claim by Rasch on page 3, paragraph 3 (of the whole book) "statistical tools ... such as correlation coefficients ... have found no place in out investigations", these charts are essentially those shown by modern software when calculating repeated measures reliability coefficients. These are essentially correlation coefficients between the scores in two similar tests taken one after the other. Granted, that Rasch protests on page 5 that "we are not going to consider this as a case for correlation analysis [but rather] glance at the figures ... and look for the possibility of translating uniformly from the results in [the first test] to results in [the second test], but methinks that's protesting too much. Where would the harm be in calculating the correlation coefficient and then proceeding with the original analysis?

On page 10, Rasch explicitly discusses the merits and demerits of deterministic versus probabilistic models. I had forgotten that was in there and I am glad it is, because as I said above, it is what for me makes Rasch interesting. He gives Newton's laws as a classic example of the deterministic model, and the kinetic theory of gases and the rules of radioactive decay as examples of a probabilistic approach in physics. He then describes human behaviour as essentially random, and suggests probabilistic modelling is better suited to psychometrics than any deterministic approach.

I am also delighted to observe that Rasch illustrates his argument with a "ball-drawing game", randomly pulling red and white balls from a bag. I had completely forgotten that he did this, and reading it now makes me feel much more confident about continuing with the models I described in some of my earlier blogs. Admittedly he then jumps from balls to mice (perhaps because mice, being alive, have more in common with children), but I shall continue to use balls, because the theory works just as well with them.

Sunday, July 19, 2009

Dichotomous data and rates

At last I have in my possession the Rasch book. Amazon offers it second hand from $235 (as at today). By going through the Institute for Objective Measurement, I got it for $30.

Reading the foreword there are two important things, which Rasch made clear to other people. The first is that if a test is constructed to produce dichotomous data, it must be properly graduated. In the guts of the book Rasch produces a model for data generated by a properly spaced test, and then defines parameters which have to be met by real data, for that data to be said to fit the model. The second is almost a sleight of hand, by which he states that the probability of a result is a function of the ability of the person and the difficulty of the test, and the probability of another result is a similar function of the ability of the person and the difficulty of the test; and then he combines the two using the a couple of rules for the probability of a specified combination of events, and makes one of the parameters (either ability or difficulty, depending on his mood) disappear, thus creating an objective measure of the remaining parameter.

The most interesting thing from my perspective of the example given in the forward is that it does not concern dichotomous data but rather reading rates. Yet the first four chapters of the main book deal entirely with dichotomous data, and almost all the subsequent theory and research deals with dichotomous data. I think for the sake of completeness, I will revisit these chapters, and then I hope to develop some code based on the chapters which deal with reading rates.

Wednesday, July 15, 2009

Debugging Applets

In the Exceptions Trail of the Java Tutorial, the catch Blocks lesson gives the following as an example of a catch block:

} catch (IOException e) {
System.err.println("Caught IOException: "
+ e.getMessage());

This works fine if you are running an app from the command line, or even an applet using appletviewer, but it is absolutely useless if you are testing an applet in an ordinary browser.

My applet is supposed to record performance data, and I found it frustrating not knowing whether it had actually written anything to the database without running a separate query on the database. I therefore added a GUI object to the applet to display the message:

  if(dbDEBUG) {
jTADiagnosics = new JTextArea();
jTADiagnosics.setFont(new Font("Arial", Font.PLAIN, 14));
c.insets = new Insets(0,10,10,10); //12oc,9oc,6oc,3oc padding
c.gridx = 0; //aligned with button 2
c.gridy = 8; //eighth row
c.gridwidth = 3; //3 columns wide
add(jTADiagnosics, c);

and a method to display errors in this text area:

   private void addItem(String newWord) {
if(dbDEBUG) {

Then for the reasons explained in my previous blog, the arguments passed to this method are the return values of the method used to do the data related stuff - loading the driver, making the connection, and most importantly, adding lines to the database.

On reflection, I think I have written these two blogs back to front, and the last one should have come before this one, but who cares? My intention now is to leave coding for a while and get back to Rasch theory.

Thursday, July 9, 2009


When I was using Visual Basic I never really got to grips with functions. I knew how to use them, and liked them, if someone else wrote them. For example the function log(x) returns the logarithm of x, and is often handy to use. But I took the view that all the useful functions had already been written, and I could not see why I should ever need to write one myself.

On the other hand, I found simple subroutines very useful, especially for chunks of code which might be called in more than one circumstance. But I never made a conceptual link between a function and a subroutine.

In Java, what VB calls a function is called a method, and what VB calls a subroutine is called a void method, so one is led to think about the similarities between the two constructs.

My stereotype of a function is a lump of code with a single unambiguous purpose. For example the function log(x) would never be used for anything except to calculate the logarithm of a number. It was only when I started coding in java that it occurred to me that what is returned by a method might be like a bi-product of code, the principal purpose of which is to do something completely different.

For example in my Java Math Test, I have an applet which displays the user interface, and supporting classes which do the grunt work, including most recently the data connection. I had got beyond testing with the command line applet viewer, and wanted to try the applet in a regular browser calling files from a web server. As I could no longer use standard output for debugging messages, I wanted to display a message in a GUI object on the applet itself.

Most of the stuff done by my supporting classes is done by calling void methods. But in order to check whether or not a connection had been made, I dressed up the connection code in a method returning a string. The code looked as follows:

public String dataConn() {
try {
String strconn = "jdbc:derby://";
liveconn = DriverManager.getConnection( strconn );
smt = liveconn.createStatement();
buffer = "connection succeeds ";
} catch(Exception ex) {
buffer = "connection fails ";
return buffer;

And it was called by the simple expression:


Where addItem is a method on the main applet which adds short diagnostic messages to a string of messages and displays them in a text label. On the surface (at least from where it is called) it looks as if dataDriv is a method or function designed to do something with text, whereas in fact it's primary purpose is to make a data connection, and the text it returns is a bi-product for diagnostic purposes.

Sunday, July 5, 2009

Coding for Rasch

This is my first day back with my main project for two months. This is the page I wanted to write after my 24 April post. The question I want to address is what fields do I need to store data for subsequent Rasch analysis?

Rasch analysis investigates the interaction between test items and test participants. Core fields would therefore include the test item, the participant, and results of the interaction.

Recording the item is pretty easy. If a math student is asked for the product of two and two, the item is 2x2=?

Recording the student is a bit harder. It opens up a whole hornets nest of privacy and data protection issues. However, the analysis does not really require demographic details. All we need to record is that an entity addressing Item A of a session was the same entity addressing Item B in the same session. It does not matter that the same person may later complete another session. From a Rasch perspective, it is a new entity. It may be the same person a little older. It may be the same person with a headache, or feeling tired. It may even be the same person feeling the same; or it may be a different person with similar performance capabilities. All that matters is that data from a specific session is identified for the purpose of the analysis. For the time being I think a date stamp (in milliseconds) will suffice. If the tool ever became really popular, perhaps the date stamp could be augmented by an IP address.

For the interaction details, the Boolean result of the interaction is essential for conventional Rasch analysis, and for my preferred scoring rate metric, the time (in milliseconds) taken to achieve that result, is also essential. That can be converted into a scoring rate for the item, expressed as correct answers per minute (capm).

That will probably do for the main raw data table. That will be the data recorded in each session.

But what information should be served back to the participant? The most ordinary drill and practice program gives the user a raw score as they progress through an activity. Some (such as Mathletics) introduce the element of time, but usually following the model of a conventional speed test. From a psychometric perspective this is fraught with theoretical problems which have been amply discussed in the academic literature, although I don't have any references at my finger tips just now. Few, if any, give a scoring rate on individual items.

We can do that and go a stage further. Data drawn from the raw table can be processed to produce estimations of item difficulty. These estimations can be combined with real time interaction data to feed back estimations of student ability. This information can be displayed for students as they use the tool.

Saturday, July 4, 2009

Linguistic Rules

I have been silent for six weeks because I have been bogged in theory. At the end of my last blog I reported that I had been urged to revisit the core language lessons. So I went right back to the concepts trail and looked up the Inheritance lesson to better understand the extends key word, and the Interface lesson to better understand the implements key word. Then in the Language Basics trail the Variables lesson refreshed my memory on the static key word, Class Variables (Static Fields), Instance Variables (Non-Static Fields), and Local Variables. The this key word is described in the Classes and Objects trail in a lesson by the same name.

After re-reading this lessons I was able to better understand the posts in this thread at the end of which I posted some code, which for me tied it all together, and solved my event listener problem.

I went on to have problems with Window Listeners, but they arose because of my obsession with walking before I run. My main project is going to be web based, so I shall never need a window listener. I was fiddling around with a JFrame, because I had a mental blockage with applets, perhaps because building an applet represents the end game for me.

When I bit the bullet and embraced the Applet Trail I was reminded that the Applet interface has two built in "closing" events, Stop and Destroy. I therefore had no need for a window listener. However, there was another problem I had not bargained for: security restrictions of which the most serious for me was the inability to read or write files on the host.

When I first read this I gave it little attention because I don't want to write to the client, I want to store information at my end. But as I muddled my way through the mire of data connections, I learnt that the (easiest to use) embedded Derby database engine writes on the local host.

I therefore needed to take another bull by the horns and embrace the Apache Derby Documentation. This is well written, with easy to follow examples, but the was a lot to learn. Essentially I needed to run the NetServer application on my web server, specify that it should receive connections from anywhere, and then use the client driver with my applet.

There was no need for any other supporting classes. As long as the communication was with a server app running from the same address from with the applet was downloaded, all the code can be contained in the single applet. Put another way, calling supporting non-applet classes from the applet to make the data connection is not a way around the security restrictions. I had seen some sample code which used supporting classes to make a connection, and I foolishly thought I could use it to break the rules, but I could not.

Instead, I ended up making a successful test connection straight out of an applet, and I posted the code at the end of this thread for future reference.