My blog will become more blog like again for a while now, and more about learning Java, because I haven't a clue where I am going next.
Already I have rebuilt the bare bones of an application, once created in VB6, and posted it as an applet on a web page. I have also revisited some raw theory, which had been floating around in the back of my mind for years. I am now satisfied that I know what I want to estimate, and I know in theory how I want to estimate it. But translating that into practice will be a bit harder.
I have a pile of data collected years ago from the VB app. The data was never used at the time and was invisible to the user. The code to collect it was tacked on as an afterthought, "just in case" I ever got around to using it. I had an idea what data I needed to collect, but I had no idea how I would process it, so the data layout was designed purely for ease of collection - i.e. with a minimum of code and in a format which took up a minimal amount of space. So now I have a bundle of CSV text files, storing data in the following format:
Each file contains many more columns and many more rows, but they all follow the same pattern as depicted in the array above. The first row contains a list of addition test items. The second row contains a Boolean result, where 1 represents a correct answer and 0 represents an incorrect answer. The third row contains the scoring rate for the item, expressed as correct answers per minute (capm). Each set of three rows represents a student session.
To process this according to the method outlined in my last blog, I need code which parses through this data, recording an average scoring rate for the student session, and the scoring rate for each item, in a table which associates the rate with the student session.
Choosing a layout for the transformed data is something of a conundrum. There could be an infinity test items, so having a field (or column) for each item would be absurd. But there could also be an infinity of student sessions, so having a field for each session would also be absurd.
Somewhere there needs to be an index of items, but there also needs to a larger table recording every time an item has been used, together with a session index, and summary information from the session. This implies a need for a session index, but there also needs to be a larger table listing every item used in that session, and summary information about that item.
Writing that last sentence reminds me that I have been here before. The gross item table is similar to the data recorded by my test applet. That looks something like this:
|23||1247061594801||1||3 + 2 =||1||8.683068|
|24||1247061594801||1||5 + 3 =||1||36.51856|
|25||1247061594801||1||2 + 5 =||1||39.16449|
|26||1247061594801||1||12 + 5 =||1||32.91278|
|27||1247061594801||1||12 + 4 =||1||37.45318|
Here the first column is an overall index, which is probably redundant. Next there is a session index, based on the start time of the session. The third column represents an index for the item type, and the fourth column is the item itself, written in longhand text. Just now, with simple arithmetic operations, recording the item in full in this table is not an issue, but in the future, when items might include questions on history and literature, this will need to be replaced by an index. The fifth and sixth columns record results analogous to the second and third rows of the first table above. The fifth column shows a Boolean result, and the sixth column shows the scoring rate for the item.
So should I transform the old table into the format of the new one, or should I start again? I think for now I should work with the current "new" table. So I need to suck my old CSV files into that. My next step with therefore be a visit to the Java Tutorial thread Basic I/O.