268 CHAPTER 21
cannot be taken to mean “twice as much,” as is the case with real measurements.
Here Energy Invested in Burials takes the form of a variable with three categories,
and is, in this respect, much like the other category variables. It differs from the other
category variables in that the three ranks definitely come in a prescribed order. We
recognize that to say low, high, medium is to put the categories out of their correct
order. When we assign numbers to them, then, we assign 1 to low, 2 to medium, and
3 to high so that mathematical manipulations of those numbers can recognize that
in a very meaningful sense medium falls between low and high.
ThedatainTable
21.1 are organized as they might need to be for a statpack.
The measurements, of course, are represented by their numeric values, as always.
The categories are also now represented by assigned numeric values, something we
have not needed to do previously with categories. It is necessary in a multivariate
dataset because the values of the variables must be manipulated mathematically. We
cannot assign, say, “P” to present and “A” to absent for Platform, because we cannot
add, subtract, divide, and multiply with “P” and “A.” In principle, we could assign a
value of 0 to present and a value of 1 to absent, but it is customary to do it the other
way around. It is easier to use most software that recognizes differences between
presence/absence variables and other kinds of variables if 0 is the value of absent
and 1 is the value of present.
Values of 1, 2, and 3 have been assigned to Energy Invested in Burials, just as we
have done previously with ranks. In principle, we could have assigned 3 to low, 2 to
medium, and 1 to high, but it is much less confusing to assign low numeric values to
low amounts and high numeric values to high amounts. If the dataset had a category
variable like Wall Construction (wattle-and-daub, wood-plank, mud-brick, or wood-
plank-with-mud-brick), we would assign numbers to each of these categories as
well. Ordinarily we would not use 0 as one of the values for this variable, since
none of the categories really means absence. Instead we might use 1 for wattle-and-
daub, 2 for wood-plank, 3 for mud-brick, and 4 for wood-plank-with-mud-brick.We
could easily mix the number values around, though, for this variable. Any of the four
categories might be assigned a value of 1 since the number values do not represent
any sense of ranking of the four categories. It must be remembered that the numbers
assigned to the categories for Energy Invested in Burials convey information about
ranks, while those assigned to the categories for Wall Construction would not. This
distinction can matter in multivariate analysis.
The notion of missing data also plays an especially important role in multivariate
analysis. We have not needed to be concerned about it previously because missing
data usually takes care of itself when dealing with one variable or two. If a scraper
is broken, and we cannot measure its length, then automatically no measurement
for it appears in the batch of numbers we are exploring and calculating indexes for.
That scraper just disappears from the sample when we look at length measurements.
It might well reappear when we look at the batch consisting of categories of raw
material. The fact that it is broken would not prevent identifying the raw material of
which it was made. If we investigated the relationship between scraper length and
raw material, the broken scraper would disappear again. It would disappear because
we would have no measurement for its length and could not include that case in