Wednesday, December 9, 2009

The nub

Sometimes people ask why I do what I do. Why do I write software nobody wants to use? Why do I care about the reliability and validity of psychometric data? Why do I want to make available assessment tools, which are accurate and meaningful, as well as cheap and easy to use?

I answer a question with a question. How is it, I say, that scientists such as Newton and Einstein were born at exactly the right time and in exactly the right place for their work to be understood, appreciated, tested, and applied?

The answer, I reply to myself, is that Newtons and Einsteins have been born in mud huts throughout the globe, throughout history; and they are still being born. The Newton we read about happened to the lucky one born in the right place at the right time, wealthy enough to receive an education and to study as a vocation, in a society disposed to listen, rather than ignore him, imprison him, or burn him at the stake. Einstein was also a lucky one, born at a time when his more esoteric theories could be put to the test and applied.

My mission is to dredge out the genius buried in the mud huts and slums of the world.

Why do I care? I care because I know it is there. I have seen children begging in the slums of India, with minds crying out to be heard.

I have another blog, which I use to express disdain for the society in which we live, so I shall refrain from raving on here. I shall simply suggest that every intelligent child lifted from the mire may contribute to lifting society itself from the mire, in later life.

Tuesday, October 20, 2009

Marketing

I am crap at marketing. In a perfect world, I'd never have to sell anything. I'd just be. But the world is not perfect. And why waste time creating computer software if you can't sell it?

One of the reasons I began the quest to learn Java was that the version of my software written in VB6 is sitting on 600 CD's, which now gather dust in my living room. I spent a year writing to, calling and visiting schools, but the revenue generated was barely compensation for the installation time, let alone marketing, let alone development.

Out of 221 schools on the target list (essentially primary schools in the metropolitan area of Perth, Western Australia), 61 agreed to a meeting and a trial on at least one computer, and 16 actually paid for a full site license. The official price was $660, but I let some have it for half that, and I always threw in a day's worth of installation and training. I didn't charge for the two or three presale visits, and for the three to six month evaluation and decision making period, I had no income. After a year of this, and a negative income after operating costs, I pulled up stumps and let the software rot.

Was the product crap? Of course I am biased, but I don't think so. A group of schools in the remote Pilbara region used Federal Government grants to fly me out to install the software and train staff. A number of schools paid me to address "PD" staff training sessions. Two regional radio stations interviewed me on the software and what it was trying to do.

What went wrong? Essentially, I ran out of steam. For one person to develop, market, and support a product is to much to ask.

Successful "education" software is mainly produced by games companies, who have an existing marketing and support infrastructure. Even the software is produced mainly by games developers, and the curriculum content is traditionally minimal.

Latterly, some software has begun to appear on the web, with a slightly higher quality content. Mathletics is one. My daughters use it at school, and for a while they were keen to use it for homework. But their interest waned after a few weeks. Now they use my software at home just as often, perhaps to please me.

The bottom line is that designing software, which covers the curriculum, records progress, and is sufficiently interesting to captivate the attention of children for more than a few hours, is a task so enormous, so difficult, and so expensive, that neither the public nor the private sector has yet attempted to do it properly.

I freely admit that both my CD software and my web-based software only scratches at the surface of the primary school math curriculum. But I'm not going to sit at home and dedicate the rest of my life to reinventing the wheel if I can't get people to use what I have done already.

Selling my web-based software really shouldn't be that hard. It's free! But so far, offering it to just two schools to try out has been like trying to push shit up hill.

The first was the local school in the country town, where I now live. I have lived here, repairing computers, for five years, I have four children, and I sit on the local Council, so getting an appointment was not a problem. And the meeting went very well. I had been told that the new deputy principals, a husband and wife team, were dead keen on computers, and on their application in education. And sure enough, they understood what I was saying and were enthusiastic about what they saw. They said they would address the next staff meeting on the subject and get back to me.

Silence followed.

Many weeks later the same team were invited into a Council function. A few days before the function, I put in a phone call to ask what had happened, but the call was not returned. At the function itself, where I was in the role of host, rather than salesman, I sought them out again to ask what had happened. Unable to run away, or avoid me, I sensed awkwardness, even fear. After much vacillation and skirting around the subject, all I got out of them was that I would need to talk to the principal. So far I have not bothered. The application is not permanently hosted yet, and it would be premature to make a fuss. I called on them, because I had been told they would be interested. They were, but some invisible barrier rose up and prevented further progress.

The second was my daughters' primary school. Again, as a parent, I had easy access to an audience with the school principal, and just as with the first school, she purported to like what she saw. She even later wrote a polite note extolling the virtues of the software, but enclosed the CD, which she was returning to me. So I tried the guy who emails the school newsletter, and he passed me to the lady who runs the computers. I got the CD back to her, but after a couple of failed attempts to meet (my contact with the girls is infrequent and erratic), she was also wanting to return the CD, saying the school had access to "plenty of resources". I'll remember that the next time they write to parents asking for money.

Prior to contacting these schools, I had assumed they would be friendly, and suitable candidates for a pre-release trial of the software, tolerant of glitches, and willing to try subsequent editions. They have made it clear that they are not. And while ten years have passed since I was last knocking on doors, and awareness of technology has improved, teachers are as conservative as ever, entrenched in their daily routine, and deeply suspicious of anything unfamiliar.

Sunday, October 18, 2009

More on Scoring Rates

Following on from my last blog, the table below shows the raw scoring rates on two additional items, H and K. Item H is twice as hard as item I, and is only addressed by Student B. Item K is twice as easy as Item J and is only addressed by Student A. This scenario synthesises one which might be generated by a computer based adaptive arithmetic test, which presents more difficult items to more able students and easier items to less able students.

Raw Rates Item H Item I Item J Item K Session Mean
Student A
4 8 16 9.33
Student B 4 8 16
9.33
It Mean 4 6 12 16 9.33

From the table, the effect of the adaptive component of the computer based arithmetic test has been similar to that of a very good handicapper in a horse race. By presenting more difficult items to the more able student and easier items to the less able student, it has produced a dead heat in the result. An examiner looking at the raw rates might be misled into thinking that both students had the same ability. Hence the need to adjust the results to take into account the difficulty of the items presented to each student.

Similarly the adaptive component of the test has distorted the item mean scores of those items presented to only one student. Take Item H. The item mean scoring rate is shown as 4. However, had the item been presented to Student A, from the stated assumptions of the example, one might have expected the scoring rate to have been 2 capm, and the item mean scoring rate would then have been 3, not 4. In the case of Item K, the item mean scoring rate is shown as 16. From the stated assumptions of the example, had this item been presented to Student B, one might have expected the scoring rate to have been 32 capm, and the item mean scoring rate would then have been 24. Item H has been made to look relatively easier than it is, because it was only presented to the more able student, and Item K has been made to look relatively harder than it is, because it was only presented to the less able student.

The scoring rate quotients calculate out as follows:

Quotients Item H Item I Item J Item K Session Mean
Student A
0.43 0.86 1.71 1.00
Student B 0.43 0.86 1.71
1.00
It Mean 0.43 0.64 1.29 1.71

The session rates can then be adjusted, using the item quotients to calculate the adjusted rate. I am reversing the order here from my previous blog. In the previous blog I adjusted the item rates first, but in this example, it is the session rates which most clearly "need" adjusting, and the item rates in fact cannot be adjusted.

Adjusted Rates Item H Item I Item J Item K Session Mean
Student A
6.22 6.22 9.33 7.26
Student B 9.33 12.44 12.44
11.41
It Mean 9.33 9.33 9.33 9.33 9.33

The adjusted mean scoring rate for Student B is now higher than that for Student A, but not by a factor of 2. Just looking at the numbers for this example, it very clear that the data set is "incomplete". The "missing" data from Student A on item H and from Student B on Item K is distorting the results. The session quotient method of transforming the data offsets the distortion partially, but not completely.

And in this example, adjusting the item rates, using the session mean quotients, is not very useful, as the session means were identical, and the session mean quotients were all unity. It follows that iterations would not achieve much, because no matter how many times you divide by one, you move no further forward.

Friday, October 9, 2009

A closer look at scoring rates

In my blog of 25 August, I described some iterative transformations on scoring rate data from a computer based arithmetic test. I said I would report the results of further iterations, if I liked them, and from the time that has past it should be obvious to anyone reading this that I didn't.

The transformations were based on what I called the scoring rate quotient (SRQ). Essentially I divided the scoring rate for every item in every test session by the mean of all scoring rates for every item in every test session to produce the SRQ for individual session-item combinations and to calculate the mean SRQ for every session and for every item.

To illustrate, imagine two students, A and B, addressing two items, I and J. Imagine in this case that Student B scores at twice the rate of Student A and that Item I is twice as difficult as Item J. The raw scoring rates might look as follows:

Raw Rates Item I Item J Session Mean
Student A 4 8 6
Student B 8 16 12
Item Mean 6 12 9

The scoring rate quotients would then be as follows:

Quotients Item I Item J Session Mean
Student A 0.44 0.89 0.67
Student B 0.89 1.78 1.33
Item Mean 0.67 1.33 1.00

The session quotients can then be used to recalculate the item rates.

Adjusted Rates Item I Item J Session Mean
Student A 6 12 9
Student B 6 12 9
Item Mean 6 12 9

Or the item quotients can then be used to recalculate the session rates.

Adjusted Rates Item I Item J Session Mean
Student A 6 6 6
Student B 12 12 12
Item Mean 9 9 9

Expressing this algebraically, the means are calculated as follows:


Session Mean A = (RIA + RJA)/2 (1)

Session Mean B = (RIB + RJB)/2

Item Mean I = (RIA + RIB)/2

Item Mean J = (RJA + RJB)/2

Grand Mean A = (RIA + RJA + RIB + RJB)/4 (2)

The scoring rate quotients are then calculated by dividing the raw scoring rates by the grand mean:


SRQ IA = 4RIA/(RIA + RJA + RIB + RJB) (3)

SRQ IB = 4RIB/(RIA + RJA + RIB + RJB)

SRQ JA = 4RJA/(RIA + RJA + RIB + RJB)

SRQ JB = 4RJB/(RIA + RJA + RIB + RJB)

The session mean quotients are then:


Session A Mean Quotient = 4(RIA + RJA)/2(RIA + RJA + RIB + RJB) (4)


= 2(RIA + RJA)/(RIA + RJA + RIB + RJB)

Session B Mean Quotient = 2(RIB + RJB)/(RIA + RJA + RIB + RJB)

The item mean quotients are:


Item I Mean Quotient = 2(RIA + RIB)/(RIA + RJA + RIB + RJB)

Item J Mean Quotient = 2(RJA + RJB)/(RIA + RJA + RIB + RJB)

And the grand mean of the quotients is:


Grand Mean SRQ = 4(RIA + RJA + RIB + RJB)/4(RIA + RJA + RIB + RJB) (5)


= 1

The adjusted item rates are calculated by dividing the raw item rates by the session mean quotients.


Adj Item IA = RIA(RIA + RJA + RIB + RJB)/2(RIA + RJA) (6)

Adj Item JA = RJA(RIA + RJA + RIB + RJB)/2(RIA + RJA)

Adj Item IB = RIB(RIA + RJA + RIB + RJB)/2(RIB + RJB)

Adj Item IB = RIB(RIA + RJA + RIB + RJB)/2(RIB + RJB)

now we have stipulated that item I is twice as hard as item J so:


RJA = 2RIA (7)
and RJB = 2RIB

So we can re-write expression 6 as:


Adj Item IA = RIA(RIA + 2RIA + RIB + 2RIB)/2(RIA + 2RIA)


= RIA(3RIA + 3RIB)/6RIA


= (3RIA + 3RIB)/6


= (RIA + RIB)/2 (8)

Adj Item IB = RIB(RIA + 2RIA + RIB + 2RIB)/2(RIB + 2RIB)


= RIB(3RIA + 3RIB)/6RIB


= (RIA + RIB)/2
so Adj Item IA = Adj Item IB

Thus the adjusted item rate for item I is identical for both sessions, and also equal to the item mean for item I. The same is true for item J.

Of course this is the special case envisaged by Rasch, where all items are completed by all students. It was nice to work through this special case, because, in my mind at least, it indicates that a single pass transformation is sufficient, and that there is no need for multiple iterations.

In my next blog, I shall have a closer look at the more general case where not all items are completed by all students.

Thursday, October 8, 2009

UK Car Hire - caveat emptor

The Web is a great boon to travellers. Where once you had to sit like a prune on a travel agents chair while they fiddled about for hours on a computer, now you can tailor make your own holiday from your own living room.

As well as the vendor sites, there are these web sites, which purport to search for and sort whatever you are looking for. At the top of the list are what purport to be the best bargains, and the unwary might simply click on these and look no further.

If you type UK car hire into Google, an array of sites like this one appear high on the list. It invites you to enter dates, a pickup location, and a car type, and then runs a search "of up to 40 companies" for you.

A couple of things should be born in mind when interpreting these results. One is that only 4 companies actually have representation at the airport. From memory these are Avis, Budget, Europcar, and Hertz. The others are scattered through West London, take ages to pick you up, and are hellish difficult to find when you need to drop the car off.

Another is that the "search" website quotes a very bare hire price. When I ran a search a couple of weeks ago, Thrifty quoted £322 for 9 days, which is approximately £ 36 a day - and very reasonable it seemed. They told me in advance that 2 child seats would add £5 a day each, bringing the rental to £46 a day - still quite reasonable. What they did not tell me that collision damage waiver, which used to be about £5 a day, was now an extra £18 a day. That, and a few other extras, brought the total price for the nine days to £649, which was more than double the original quote at nearly £76 a day. They also offered to sting me for even more if I wanted automatic transmission.

So while the "search" websites might be a useful first step to identify suitable car models, the wary traveller should then go to the four airport based companies and get detailed quotes including all extras, collision damage waiver, and automatic transmission, if required.

Sunday, September 20, 2009

Acronis True Image Hanging

One of my clients uses Acronis True Image Home for their nightly back-up. I don't know the product well, but I like what I've seen so far - mostly. It seems pretty thorough. The backups run like clockwork every night. And I have successfully recovered a system which caught a nasty virus, such that a reformat was the only cost effective fix.

But the other day, the front end application, the management console, would not start. It just froze, with a flash screen reporting that it was checking disk D. Quite why it needs to check Disk D before opening is a mystery, especially because it is not included in the backup and it need never need be read from. On the machine in question D is the manufacturer's "recovery" sector. Quite frankly they are a waste of space, and the word recovery is a complete misnomer - "Factory Reset" sector would be more honest and appropriate. And given that nothing useful is ever written there, it mystifies me that Acronis should hang while trying to read that disk sector.

Be that as it may, it was hanging and I was in a quandary as to what to do. I tried going away and coming back a few hours later. It was stress reducing, but it didn't fix the problem. I tried searching on strings like "Acronis True Image Hanging", but all that told me was that Acronis seems to hang a lot, in a wide range of circumstances, and there doesn't seem to be any consensus on a fix.

So if there is something I would criticise about Acronis it is the heavy reliance on wizards. Perhaps there is a setting somewhere, which gives an "expert" view, and enables manual editing of tasks, but I certainly haven't found it.

My dilemma was increased by the fact that the nightly backup was working perfectly. All I wanted to do was change the backup folder, to initiate a new full backup and a new month long string of incremental backups. I didn't want to fiddle around uninstalling and reinstalling Acronis, because then I'd lose my working scheduled task. I was quite tempted to run away, and not tell the client that anything was wrong, and let the incremental backups just go on for years and years in the same folder. But I decided that would be irresponsible.

After much deliberation I decided that there had to be a script somewhere, controlling the scheduled tasks, so I set about looking for it. I had a look in the "program files" folder, but there was nothing very promising there. All the dates were way too old. So I changed the folder settings to show both hidden folders and protected operating system files, and went to hunt for application data in documents and settings (all of this is in XP pro by the way). I first looked in the user folder, but there was nothing for Acronis there.

Then I remembered the option in the standard Windows install which says "Do you want this program to be available to all users?" So I checked out All Users\Application Data and sure enough, there was a directory called "scripts". Bingo! The file had a very funny name, and I won't print it here in case some malicious bot is probing my blog, but sure enough it opened with notepad, and it was just an ordinary text file with a script in full English. I hope no one from Acronis reads this and encrypts the next edition, because it made me like the product more. The path to the backup file was easy to find, and I just had to modify two characters to change it to the new folder for the current month.

To my enormous surprise, the script ran perfectly that night, and left the new full backup in the new folder as I intended. Next month I'll just go straight into the script and not bother with the GUI.

Saturday, September 19, 2009

Building a financial mini-app

Building a mini-app from scratch using only text files and the command line is a bit like building a piece of furniture from IKEA. I find that after five steps I realise I made a mistake in step 2, and I have to pull everything apart again and start almost from scratch.

When I was creating the financial transaction table, I decided to dodge the date conversion issue and make the date field text only. After all it was my intension to read the data back into Excel or Access for a GUI presentation at the end of the day. But when I had imported the data and started to query the tables, I decided I needed to sort by date during processing. So I had to drop the table, with all its data, and create a new one with a real date field.

The irony is that, such are the quirks of MS Access, it was much easier exporting to a CSV file with the exact date format, that Apache Derby 10.4.2.0 was looking for, than getting it back. When you export a CSV file Access gives you many choices on how to export dates, but when you suck it back in it seems a lot more fussy. But that is of no importance. I brought the data back into Access for display purposes only, so a string was just fine at that point.

For the processing itself I used two resultsets and a loop within a loop to scroll through them. This required two open statements. I also needed a third open statement to modify certain records and delete others in the raw data table.

The first resultset brought up all sales transactions. The outer loop scrolled through that. The second resultset was called by the outer loop, and brought up all purchase transactions for the stock code of the current record in the first resultset. The inner loop scrolled through the second resultset accumulating the numbers of shares purchased until it was equal to or greater than the number of shares sold. In the case of inequality an adjustment was made.

To illustrate, here is an extract from the raw table in MS Access.

ID Date TrType Code Price Qty
81 18/03/2008 Buy ANZ $20.78 50
206 7/11/2008 Buy ANZ $16.42 60
221 13/11/2008 Buy ANZ $15.00 65
228 17/11/2008 Buy ANZ $13.50 70
348 13/07/2009 Buy ANZ $14.40 695
352 5/08/2009 Sell ANZ $19.70 -50
362 9/09/2009 Sell ANZ $22.19 -290

And here is the corresponding data in the new table.

ID Date TrType Code Price Qty
352 2009-08-05 Sell ANZ $19.70 -50
352 2008-03-18 Buy ANZ $20.78 50
362 2009-09-09 Sell ANZ $22.19 -290
362 2008-11-07 Buy ANZ $16.42 60
362 2008-11-13 Buy ANZ $15.00 65
362 2008-11-17 Buy ANZ $13.50 70
362 2009-07-13 Buy ANZ $14.40 95

At first glance, the impact of the work was not especially great. But on closer examination there are two key changes. First, the individual transaction code in the first table has been replaced by a sales code in the second table. Second, the data has been reordered such that each sale has associated with it just enough from the buy transactions to cover it exactly.

Looking at the actual data, the first sale was easy, because there was a buy which exactly matched it. But the second sell straddles three complete buy transactions and part of a fourth.

If anyone can do that in SQL, I'd like to see it. The only method I could think of was using good old fashioned code, as described. It was a bit of effort, but it was worth it, because I now have a report which rigorously and systematically matches sales with purchases, and tells me at a glance what my exposure to realised profits is in the current year.

Thursday, September 17, 2009

Using Java with Shares

I have this weird belief that spending too long looking at a share portfolio is morally wrong, because it's only money. But then I get this guilt thing about a lack of due diligence. My compromise is to spend the weekdays on my main project, which is developing interactive assessment software, and a part of the weekend looking at and thinking about shares.

My traditional tools for this are Microsoft Access and Excel. And while the market was diving, and I was only buying shares, that worked fine. All I had to do was to track what I had bought, what I had paid for it, and what it is worth now. But now the market is rising again, and a handful of shares have risen by silly proportions, I feel I need to sell small amounts of them, at least to recuperate the original investment cost.

But I am doing quite nicely with baby bonuses and other means tested benefits, and I don't want to blow my income out of the water. So I need a quick but accurate means of tracking the cumulative effect of a series of small transactions. The typical scenario is a holding built up from say five buy transactions. A proportion of that holding is then sold. I am not sure whether the Tax Department imposes FIFO or LIFO accounting on such transactions. I should look it up, but for now I assume you can do what you like as long as you are consistent. I shall use FIFO.

So I need to parse through the holding, comparing each purchase transaction with each sale transaction. If the first purchase is greater than or equal to the first sale transaction, the calculation is quite simple. I can just apportion the total purchase costs over the number of shares being sold and subtract that from the sale proceeds to calculate the profit or loss on the transaction. But if the first purchase is less than the first sale, I need to apportion the sale proceeds over the number of shares in the first purchase. I then need to apportion the purchase costs for the second buy batch over the remaining shares in the sale transaction.

I am sure there is an abundance of software out there that does all this but there are two reasons for doing it myself. First my business model assumes a low cost base, which means not wasting money on expensive accounting or trading packages. Second it represent good practice at manipulating data in Java.

My first step will be to export my transaction table from Access to a csv file. My second step will be to create a new Derby database using the embedded driver. This database will not be accessed from an applet. I want it on my local machine, and I want it in a folder which is included in my regular working data backup.

The third step will be to create tables to store both the raw data and completed transaction data. The fourth step will be to write code to suck the csv data into the raw data table. The fifth and most difficult step will be to write code to extract data from the raw data table and build this into a nice neat completed transaction table.

Tuesday, August 25, 2009

GUI for thinking

Whatever bad things some people say about Microsoft, in the olden days they brought to market a raft of products, which were accessible, easy to use, and useful. MS Access is an example. It may have limitations as a commercial database engine, but as a sketch pad, a tool for collecting one's thought's, it is, in my opinion, hard to beat.

My current task is to design a set of iterations through scoring rate data to render the scoring rate as an objective measure of student ability and item difficulty. The raw data is set out in a single table, as shown in my last blog. On this I have written two queries:

SELECT [HAItemB].[sessidx],
Avg([HAItemB].[rate]) AS AvgOfrate
FROM HAItemB
GROUP BY [HAItemB].[sessidx];

and

SELECT HAItemB.item,
Avg(HAItemB.rate) AS AvgOfrate
FROM HAItemB
GROUP BY HAItemB.item
ORDER BY HAItemB.item;

These queries calculate the average raw scoring rate for each session and each item. The item query looks like this:

Item AvgOfrate
1+1 34.000
1+2 30.877
1+3 32.935
1+4 31.286
1+5 38.674

A third query calculates the overall mean scoring rate:

SELECT Avg(HAItemB.rate) AS AvgOfrate  FROM HAItemB;

The average rate happens to be 18.185, out of a grand total of 14,480 records.

I then joined this query with the two previous queries to calculate the scoring rate quotient (SRQ) for each student session and each item. The results for the above items are shown below.

Item ItemRate0 AvRate ItQ1
1+1 34.000 18.185 1.870
1+2 30.877 18.185 1.698
1+3 32.935 18.185 1.811
1+4 31.286 18.185 1.720
1+5 38.674 18.185 2.127

I then used the session quotients to recalculate the items rates, and the item quotients to recalculate the student/session rates, as proposed in my last blog but one. The table/array below shows this being done for five items in the first session:

Sessidx Item Rate ItQ1 SRateAdj1 SQ1 ItRateAdj1
1 1+2 67 1.698 39.461 1.642 40.805
1 3+1 60 1.784 33.640 1.642 36.541
1 2+3 55 1.552 35.435 1.642 33.496
1 5+2 40 1.481 27.000 1.642 24.361
1 4+4 50 1.938 25.806 1.642 30.451

And this is where the GUI comes in. I can sit staring at those numbers and thinking about them. At first I could see that a number (Rate) was being divided by two different numbers (ItQ1 and SQ1), and I thought why not save time, multiply them together, and divide Rate by the resulting product? But, to paraphrase Buffy, that would be wrong.

It is the item adjusted session rates (SRateAdj1), which are grouped to form the first pass adjusted session average rates, and the session adjusted item rates (ItRateAdj1) which are grouped to form the first pass adjusted item average rates.

The queries are almost the same as before, except that they are written against the table containing the adjusted rates. So for the sessions we have:

SELECT AdjSesstable1.sessidx,
Avg(AdjSesstable1.SRateAdj1) AS AvgOfSRateAdj1
FROM AdjSesstable1
GROUP BY AdjSesstable1.sessidx;

and for items we have:

SELECT AdjSesstable1.item,
Avg(AdjSesstable1.ItRateAdj1) AS AvgOfItRateAdj1
FROM AdjSesstable1
GROUP BY AdjSesstable1.item
ORDER BY AdjSesstable1.item;

For completeness, I ran a query to compute the overall adjusted average rates, but guess what? They were identical to each other and to the overall raw mean. I guess a true mathematician would have known that, but I was quite surprised. Anyway, from there it was quite easy to compute the second pass quotients. These are shown for items below, side by side with first pass numbers:

Item ItemRate0 ItemRate1 AvRate ItQ1 ItQ2
1+1 34.000 35.691 18.185 1.870 1.963
1+2 30.877 32.057 18.185 1.698 1.763
1+3 32.935 33.249 18.185 1.811 1.828
1+4 31.286 35.697 18.185 1.720 1.963
1+5 38.674 36.070 18.185 2.127 1.983

Although we are only looking at five items here, I find these numbers very encouraging. On the first pass, I asked myself the question: Why is the item "1+5" easier than the item "1+1"? Common sense would suggest this was anomalous, cause by the chance happenstance that in this sample, more able students addressed the item "5+1". And after the first iteration, when item rates have been adjusted for the ability of the students addressing them, the estimate of difficulty (given by the reciprocal of SRQ) of the item "1+1" has been increased, while that for "1+5" has been reduced.

I think that's enough for one blog. I'll continue with more iterations tomorrow, and if I like the results, I'll report on them.

Thursday, August 20, 2009

Transforming text file data

I have now transformed the raw data from my VB Application - Active Math so that it looks like this:

7/28/2000 12:41:55 PM11 3+3 1 27
7/28/2000 12:41:55 PM11 2+2 1 27
7/28/2000 12:41:55 PM11 1+1 1 35
7/28/2000 12:41:55 PM11 4+4 1 32
7/28/2000 12:41:55 PM11 5+3 1 8

I'll refrain from posting the code, but the important links were first the Character Streams lesson, especially the example from the lower half of the page entitled Line-Oriented I/O. Also from the same thread was the lesson entitled Scanning. This has nothing to do with flat bed scanners, and in VB would probably be called parsing. From this lesson I followed the link to the scanner class in the API and reversed up to the parent package (java.util) and then back down to the StringTokenizer class.

Useful forum threads were this one, which gave me the idea of using the StringTokenizer, and this one, which discussed how to use it.

I then needed some to store the data, while reading it. The relevant main trail here is Collections, and within that Collection Interfaces, and within that I chose The List Interface. The particular implementation of this interface, which I selected, for no particular reason, was ArrayList. This had all the methods I needed to add elements one at a time, read them back, and clear them as and when needed.

Finally I returned to the Character Streams lesson to write the transformed data back to a new text file. In so doing I sidestepped data design and connection issues, so that I could concentrate on the mechanics of reading text file data and transforming it.

Tuesday, August 18, 2009

Where next?

My blog will become more blog like again for a while now, and more about learning Java, because I haven't a clue where I am going next.

Already I have rebuilt the bare bones of an application, once created in VB6, and posted it as an applet on a web page. I have also revisited some raw theory, which had been floating around in the back of my mind for years. I am now satisfied that I know what I want to estimate, and I know in theory how I want to estimate it. But translating that into practice will be a bit harder.

I have a pile of data collected years ago from the VB app. The data was never used at the time and was invisible to the user. The code to collect it was tacked on as an afterthought, "just in case" I ever got around to using it. I had an idea what data I needed to collect, but I had no idea how I would process it, so the data layout was designed purely for ease of collection - i.e. with a minimum of code and in a format which took up a minimal amount of space. So now I have a bundle of CSV text files, storing data in the following format:

3+3 2+2 1+1 4+4 5+3
1 1 1 1 1
27 27 35 32 8

Each file contains many more columns and many more rows, but they all follow the same pattern as depicted in the array above. The first row contains a list of addition test items. The second row contains a Boolean result, where 1 represents a correct answer and 0 represents an incorrect answer. The third row contains the scoring rate for the item, expressed as correct answers per minute (capm). Each set of three rows represents a student session.

To process this according to the method outlined in my last blog, I need code which parses through this data, recording an average scoring rate for the student session, and the scoring rate for each item, in a table which associates the rate with the student session.

Choosing a layout for the transformed data is something of a conundrum. There could be an infinity test items, so having a field (or column) for each item would be absurd. But there could also be an infinity of student sessions, so having a field for each session would also be absurd.

Somewhere there needs to be an index of items, but there also needs to a larger table recording every time an item has been used, together with a session index, and summary information from the session. This implies a need for a session index, but there also needs to be a larger table listing every item used in that session, and summary information about that item.

Writing that last sentence reminds me that I have been here before. The gross item table is similar to the data recorded by my test applet. That looks something like this:

23 1247061594801 1 3 + 2 = 1 8.683068
24 1247061594801 1 5 + 3 = 1 36.51856
25 1247061594801 1 2 + 5 = 1 39.16449
26 1247061594801 1 12 + 5 = 1 32.91278
27 1247061594801 1 12 + 4 = 1 37.45318

Here the first column is an overall index, which is probably redundant. Next there is a session index, based on the start time of the session. The third column represents an index for the item type, and the fourth column is the item itself, written in longhand text. Just now, with simple arithmetic operations, recording the item in full in this table is not an issue, but in the future, when items might include questions on history and literature, this will need to be replaced by an index. The fifth and sixth columns record results analogous to the second and third rows of the first table above. The fifth column shows a Boolean result, and the sixth column shows the scoring rate for the item.

So should I transform the old table into the format of the new one, or should I start again? I think for now I should work with the current "new" table. So I need to suck my old CSV files into that. My next step with therefore be a visit to the Java Tutorial thread Basic I/O.

Monday, August 17, 2009

The Scoring Rate Quotient (SRQ)

Rasch expressed the expected reading rate in a reading test in relation to expected reading rate in a reference test as follows:


εi = λvi/ λv1

where λvi is the reading rate of the generic or typical student v in the reading test in question, and λv1 is the reading rate of the generic or typical student v in the reference test.

That translates fine into an estimation methodology, if you have very large data set, where all students address all tests, and the tests themselves are substantial enough for averaging to happen within them. You simply average the results to get your ratio.

It doesn't work so well if you are interested in estimating the difficulty of individual test items, and especially not if you are working with data from a modern computer based test, where the items themselves are generated randomly within difficulty levels, and where the difficulty levels are set by student performance. If such a test is working properly, the difficult items will only be addressed by the more able students, and the easy items will be addressed more often by the less able students. So if the test is working as it should, the data it generates will be biased. The difficult items will appear easier than they should, because the able students who tackle them tend to have high scoring rates, and the easy items will appear harder than they should, because the less able students who tackle them tend to have low scoring rates.

An accurate estimate of item difficulty in such a test requires that student ability be taken into account, which in turn will require some iteration through the data. Suppose we begin with a crude estimate of student ability. This must be taken into account in the estimate of item difficulty, which in turn can be used to gain a better estimate of student ability. But how?

I suggest an old fashioned quotient. Record the scoring rates of all participants and calculate the mean. Then, when assessing item difficulty (or easiness), adjust the scoring rate recorded by any student against that item by the ratio of their mean scoring rate to the overall mean. You could call this ratio the Scoring Rate Quotient (SRQ). So if Student A's mean scoring rate is twice the overall mean, his SRQ is 2, and you need to adjust the scoring rate recorded by that student against any item by a factor, which reflects this quotient. But of course, because able students tend to record higher scoring rates, the appropriate factor is not 2 but 1/2, or more generally 1/SRQA.

Similarly, the item scoring rates should be laid out on a spectrum and the mean calculated. Then in the second pass at estimating student ability, the scoring rate recorded against each item should be adjusted according to the SRQ of that item. And again if Item1 has an SRQ of 2, the scoring rate of any student tackling that item should not be multiplied by 2 but 1/2 or 1/SRQ1. The scoring rate is adjusted downwards, because it was an easy item, and a high scoring rate on that item should carry less weight than that recorded on a harder item.

Sunday, August 16, 2009

Rasch theoretical analysis of timed reading tests

If the probability of an event occurring in a very short interval of time is θ, the probability that the event occurs a times in n intervals may be estimated by the Poisson distribution:


p{a|n} n θae-n θ /a! (10)

Suppose n intervals may be aggregated into a time period of interest. For a telephone exchange, this might be a period of peak demand. For a reading test it might be five or ten minutes, however long the test lasts. Now imagine another time period used to define frequency of events λ. Rasch (op. cit. page 34) uses a 10 second interval, I prefer a minute, most physicists use a second, but as mentioned in the previous blog, it make no difference to the essence of the argument.

Note, however, in just a few lines we have referred to three distinct time intervals. First there is the very short time interval, used to define θ, and during which the event will only ever occur once. Rasch (op. cit. page 35) uses one hundredth of second for this, but conceptually it could be a lot smaller. Second, in order of size, is the frequency defining interval, such that λ is the number of times the event occurs (or is expected to occur) in this interval. Third is the experimental period, or period of interest made up by n of the very short intervals, and which could also be expressed as t of the frequency defining intervals (seconds, minutes or whatever), such that the expected number of events in the period could be expressed as either the left or right hand side of the equation below:


= λt (30)

The probability of of a specified number of events a occurring in the experimental period time period t then becomes:


p{a|t} λtae-λt /a! (31)

Rasch then makes an observation which I skirted over on my first reading of the book, but which I used implicitly in my last blog: "the event that the number of words a read in a given time T exceeds a given number n is identical with the event that the time t used for reading n words is less than T.


p{a ≥ N|T} = p{t ≤ T|N} (32)

The left hand side is the sum of all the probabilities that a is N, N+1, N+2 ... :


p{a ≥ N|T} e-λt (λtN/N! + λtN+1/(N+1)! + λtN+2/(N+2)! ...) (33a)

p{t ≤ T|N} e-λt (λtN/N! + λtN+1/(N+1)! + λtN+2/(N+2)! ...) (33b)

Rasch then throws in special case, which seems intuitively obvious, but I am sure he has good reason. In 33b he sets N to zero, so he is calculating the probability that within a certain time at least zero events have occurred:


p{t ≤ T|0} e-λt (λt0/0! + λt1/(1)! + λt2/(2)! ...)

1 = e-λt (λ + λt + λt2/2! ...) (34)

eλt = 1 + λt + λt2/2! ...

λt = ln(1 + λt + λt2/2! ...)

All of which is supposed to add up to 1. I can't see it myself, but perhaps we'll use the expression later. A second special case is when N is 1, which is the probability that either zero or 1 events take place in time T:


p{t ≤ T|1} e-λt (λt1/1! + λt2/(2)! + λt3/(3)! ...)


= e-λt (λt + λt2/2! + λt3/3! ...)


= e-λt λt (1 + λt/2! + λt2 /3! ...)


= 1 - e-λt (35)

Again I can't see how Rasch gets to my 35 (his 6.4 (op. cit. page 39)) but I've included it for completeness, in case it makes sense later. And from here here jumps to:


p{t|1} = λe-λt (36)

Where p{t|1}is the probability distribution for reading time of the first and any subsequent word. Rasch goes on to show a similar distribution for the reading times of N words, but I shall skip that, because the essential difference between a speeded test and my use of scoring rates is that I focus on the scoring "rate" for individual items, by recording the time take on each item, and dividing that into my unit time for computing rates. So for me, equation 35 is quite interesting, but anything on multiple words (or items) is not so.

The next section (op. cit. page 40) is also interesting to me, because Rasch talks of two students, A and B, who read a different rates. In fact he says student A reads twice as fast as student B:


λA = B (37)

Rasch then explains how to estimate the relative difficulty of texts, based on observed reading speeds. Each pupil reads a series of texts numbered 1 to k, and for each:


λA1 = 2 λB1

λAi = 2 λBi

Dividing:


λAiA1 = λBi/ λB1 (38)

So the ratio between the expected reading speeds for text i and another test (such as text 1) is constant for all pupils, regardless of the ratio of the expected reading speeds for the pupils. Rasch generalises this for student v:


λvi/ λv1 = εi (39)

Rearranging:


λvi = λv1εi (40)

Rasch calls λv1 a person factor and ε a text factor for reading speed (op. cit. page 40) And consistently with his preference for difficulty as a term over easiness he defines difficulty in relation to reading speed as:


δi = 1/εi (41)

Redefining λv1 as ζv, Rasch can now express the expected reading speed for any student/text combination as:


λvi = ζv/ δi (42)

Regardless of the contortions required to express expected reading speed in the same way as the probability of a correct reading, Rasch emphasises that accuracy and speed are not the same things, although they may be related, and addressing any such relationship remains an interesting possible topic for empirical research.

Sunday, August 9, 2009

Scores versus scoring rates

The next chapter in the Rasch book addresses reading rates. This is traditionally one of my favourite chapters and I once published a paper based on it. I like scoring rates because I believe intuitively that they yield a more reliable estimate of ability than raw scores. Many years ago I presented a somewhat inane paper at a fortunately sparsely attended WAIER forum. I have long been looking to find a more substantive argument, and I believe I am getting quite close. My inspiration comes from this web page.
Let's imagine two students sitting a test comprising a single dichotomous item. Imagine the ability of the first student (Student A) in relation to the difficulty of the item is such that the probability of a correct answer is 55%. Imagine the ability of the second student (Student B) in relation to the difficulty of the item is such that the probability of a correct answer is 45%. What is the probability that these students will be properly ranked by this test, according their ability?
For the students to be ranked at all, they cannot return the same score, and for them to be ranked correctly Student A must return a correct answer AND student B must return an incorrect answer. The way I chose the numbers, the probability of that outcome happens to be 0.552, or approximately 30%.
In general terms, if the probability of Student A returning a correct answer is θA and the probability of Student B returning a correct answer is θB, then the probability of a correct ranking is:

P(ranktrue) = θA(1 - θB) (28)
Before moving on to scoring rates, I'd like to say a few words about this expression in relation to the Poisson estimation favoured by Rasch and discussed in my last few blogs.
Suppose we fix the difference between the two probabilities to a fixed small amount α (like say 10%). The expression then becomes:

P(ranktrue) = θA(1 - θA + α)


= θA - θA2 + α θA


= A2 + (1 + α)θA (29)
which looks to me a bit like a quadratic, which, unless I am very much mistaken, will produce a curve looking a bit like a parabola. Just for fun, I charted it θA ranging from 15% to 95% and α set at 10%. Sure enough it plotted a lovely parabola, with P(ranktrue) ranging from 14% on the outer edges to 30% on the nose.


It follows that the most accurate student rankings are most likely to be produced by tests where student ability and item difficulty are well matched, such that the probability of success (or failure) is in the mid range. Contrast this with the very low values for the probability of an event (such as an error in a test) required for the Poisson distribution to be a reasonable approximation of the binomial distribution.
It is a paradox that the Poisson distribution, which, in my opinion, doesn't work well for dichotomous scores, is exactly what is required to predict the frequency of events in a given time period, such as the number of correct answers in a test returned per minute.
By way of a side note, speed tests, which were popular in the 1960's, received a lot of bad press in the subsequent decades, and have never again become fashionable. But for reasons argued eruditely in my doctoral thesis, the scoring rate, recorded silently by a computer, in a test with no time limit, has none of the disadvantages of a pencil and paper timed test.
Furthermore (and this wasn't in the thesis), the Poisson distribution looks exactly the same for different time intervals if you adjust the expected number of observations in proportion with the time interval. So if the expected scoring rate is 10 correct answers per minute, the curve looks the same for 10 observations in a minute, 20 observations in 2 minutes, or one observation in 6 seconds.
So you don't need to rush children by giving them a minute long speed test. You simply record how long it takes them to record a correct answer, and then you can think about the probability of a single correct answer being given in that time interval.
Now let's think about two students addressing a single item in a computer based test which records the time taken and reveals a scoring rate: Student A for whom, because of his ability in relation to the difficulty of the item, the expected scoring rate is 5.5 correct answers per minute (capm), and Student B for whom, because of his ability in relation to the difficulty of the item, the expected scoring rate is 4.5 capm. What is the probability that these students will be properly ranked by this test, according their ability?
For the test to record a "true" ranking, whatever the scoring rate of Student B, that of Student A has to be higher. If Student B scores zero capm, Student A is free to score anything above zero. If Student B scores 1 capm, Student A must score above 1 capm. So for every possible score of Student B, we need to combine that (by multiplication) with the probability that Student A scores anything above that, and we need to aggregate the probability of all these possibilities.
I am cheating a little here. I have the SOCR Poisson Experiment page open so as to get an idea of the numbers flowing from the parameters I set above. If we restrict ourselves to integer scoring rates, the range of scoring rates for Student B with greater than (nearly) zero probability is zero to 13. The range of scoring rates for Student A with greater than (nearly) zero probability is zero to 15.
I shall not attempt a proper equation to express this. There isn't room in a blog, and I can't express summation correctly so I'll put the "from to" in brackets before the sigma. The gist of it might then be:
( y=0 to 13)Σ((4.5ye-4.5/y!)( x=(y+1) to 15)Σ5.5xe-5.5/x!)
Cheating again, the core figures below are cribbed from the SOCR Poisson Experiment page, the right hand column contains the sum on the right of the above expression, and the overall sum is shown in the bottom line.
Mean 4.5 Mean 5.5 Combo
0 0.01111 1 0.02248 0.01106256
1 0.04999 2 0.06181 0.04865277
2 0.11248 3 0.11332 0.10251877
3 0.16872 4 0.15582 0.13465881
4 0.18981 5 0.1714 0.12191496
5 0.17083 6 0.15712 0.08044385
6 0.12812 7 0.12345 0.04020149
7 0.08236 8 0.08487 0.01567558
8 0.04633 9 0.05187 0.00488596
9 0.02316 10 0.02853 0.00124114
10 0.01042 11 0.01426 0.00026113
11 0.00426 12 0.00654 4.6008E-05
12 0.0016 13 0.00277 6.816E-06
13 0.00055 14 0.00109 8.195E-07


15 0.0004


Sum of Combo 0.56157066
So for the parameters set out in this example, the probability of a correct ranking is 56%, almost twice that for the dichotomous test.
Of course there is absolutely no reason to assume that just because two students have probabilities of 55% and 45% of answering a dichotomous item correctly, the same students will have expected scoring rates of exactly 5.5 and 4.5 capm on the same item. However, the orders of magnitude are not outrageous.
On the contrary, the whole range of probabilities on a dichotomous item is zero to 100%, whereas on my computer based interactive test, the range of observed scoring rates is way more than zero to 10 capm. For 7-9 year old children I have observed rates up to 20 capm. If you extend the age range to 12 y/o the observed scoring rates sometimes go as high as 40 capm.