SIMILARITIES BETWEEN CASES 281
• For a presence/absence variable, the Anderberg score is 1 for a mismatch or 0 for
either a present–present match or an absent–absent match. It thus amounts, for
presence/absence variables, to a kind of simple mismatching coefficient. That is,
it works like the Simple Matching Coefficient turned into a dissimilarity where
larger values indicate greater dissimilarity.
• For a variable with multiple unranked categories, the Anderberg score is 0 for
a pair of cases falling in the same category or 1 for a pair of cases falling in
different categories.
• For ranks, the Anderberg score is the absolute value of the difference between the
category codes divided by one less than the number of categories. For example, if
a variable has five categories (1, 2, 3, 4, and 5), cases coded 2 and 4, respectively,
would receive a score of 2/4 or 0.5000.
• For measurements, the Anderberg score is the absolute value of the difference
between the measurements for the two cases divided by the range of the mea-
surements in the batch. Anderberg recommends using the square root of this
score rather than the raw score to lessen the impact of outliers.
Once a score is determined for each variable, all the scores are averaged to produce
the final Anderberg’s Coefficient for the pair of cases under consideration. Like the
Gower scores, the Anderberg scores have a minimum value of 0 and a maximum
value of 1, so the final coefficient also ranges from 0 to 1. Unlike Gower’s Coeffi-
cient, Anderberg’s Coefficient, calculated in this way, is a dissimilarity coefficient.
A value of 0 means identical cases; a value of 1 means totally dissimilar cases.
SIMILARITIES BETWEEN IXCAQUIXTLA
HOUSEHOLD UNITS
Table 22.9 shows Gower’s Coefficient of similarity between the household units at
Ixcaquixtla from the data in Table
21.1. That dataset, as discussed in Chapter 21,
contains both measurements and ranks, along with two presence/absence variables
where present–present matches seem more meaningful than absent–absent matches.
It was because of this mixture of variables for which different treatments seem
appropriate that Gower’s Coefficient was chosen. As a practical matter, it is always
a good idea to examine a matrix of similarity scores like this. There are many pos-
sibilities for making mistakes – either with the software or in thinking through the
principles of the chosen coefficient – and it is always reassuring to notice that pairs
of cases whose values across the variables seem quite similar come out with high
similarity scores, and the pairs of cases whose values across the variables seem
quite different come out with low similarity scores. For example, Household Units
2and5showupinTable
22.9 with a very high similarity score (0.8916). A look
at Table
21.1 shows that these two household units have quite similar values on
the majority of the variables. In contrast, Household Units 14 and 20 show up in
Table
22.9 with a very low similarity score (0.3733). Again, a look at Table 21.1 is