
We ran CaMML once for each split (a) without any constraints and (b) with the
ordering constraint that the classification should be an ancestor of each of the type
nodes. This constraint reflects the general known causal structure. Each run pro-
duced a slightly different network structure, with some having the fineClass node
as a root, some not. Two measures of network complexity were used: (i) ratio of
arcs/nodes, which varied from 1.4 to 2.2 and (ii) the total number of probabilities in
the CPTs, which varied from about 700 to 144,000. The junction-tree cost was not
used as a measure, although it probably should have been!
The percentage match results comparing the CaMML BN classifications (con-
strained and unconstrained, O-N and H/M/L) are also shown in Table 11.3 (sets 4
and 5), together with the prediction results. The undesirable changes include quite a
few shifts from one specific classification to another, which is particularly bad as far
as our experts are concerned. The variation between the results for each data set 1-5
was much higher than for the variation when learning parameters for the expert BN
structure, no doubt reflecting the difference between the network structure learned
for the different splits. However we did not find a clear correlation between the
complexity of the learned network structures and their classification performance.
Our experts also looked at the learnt structures during this phase of the project,
but they did not have an intuitive feel for how these structures were modeling the do-
main. In particular, the change in the direction of the arc between the classification
node and (one or more of) the item type nodes did not reflect the causal model we had
introduced them to during the elicitation phase. Also, there were many arcs between
item type nodes, and they could not explain these dependencies with their domain
knowledge. Some time later, one of the experts investigated one of these learnt struc-
tures using Matilda (see
9.3.2.2), as part of Matilda’s evaluation process. By using
this KEBN support tool, the expert gained some understanding of the dependencies
captured in the seemingly non-intuitive structure. Overall, however, the lack of an
adequate explanation for the learnt structures, together with the unacceptable unde-
sirable re-classifications, meant that none of the learnt structures were considered for
inclusion in the implemented system.
11.3.5 Field trial evaluation
All components of the complete system were field-tested. The games were tri-
aled with individual students holding known misconceptions and their responses and
learning have been tracked [187]. This has refined the design of the games and of
the visual scaffolding and led to the decision to provide the feedback and visual
scaffolding automatically.
The complete system has also been field tested with 25 students in Grades 5 and
6, who had persistent misconceptions after normal school instruction [84, 105]. Stu-
dents worked with a partner (almost always with the same misconception) for up to
30 minutes, without adult intervention. The observer recorded their conversations,
which were linked to computer results and analyzed to see where students learned or
missed learning opportunities and how cognitive conflict was involved. Long term
conceptual change was measured by re-administering the DCT about three weeks
© 2004 by Chapman & Hall/CRC Press LLC