
360 ADVANCED TECHNIQuES, AVOIDING MISTAKES
covariance matrix. Even machine-based data entry is not error free (e.g., smudges on
forms can “fool” an electronic scanner, software errors can result in the calculation of
incorrect scores). Mistaken specification of codes in statistical programs is also common
(e.g., “9” for missing data instead of “–9”).
16. Ignore whether the pattern of missing data loss is random or systematic. This point
assumes that there are more than just a few missing scores. Classical statistical methods
for dealing with incomplete data, such as case deletion or single-imputation methods,
generally assume that the data loss pattern is missing completely at random, which is
unlikely in perhaps most data sets analyzed in the behavioral sciences. These classical
techniques have little basis in statistical theory and take little advantage of structure in
the data. More modern methods, including those that impute multiple scores for miss-
ing observations based on predictive theoretical distributions, generally assume that
the data loss pattern is missing at random, a less strict assumption about randomness.
But even these methods may generate inaccurate results if the data loss mechanism is
systematic. If so, then (1) there is no “statistical fix” for the problem, and (2) you need to
explicitly qualify the interpretation of the results based on the data loss pattern.
17. Fail to examine distributional characteristics. The most widely used estimation
methods in SEM, including maximum likelihood (ML), assume multivariate normal dis-
tributions for continuous endogenous variables. Although values of parameter estimates
are relatively robust against non-normality, statistical tests of individual parameters
tend to be positively biased (i.e., Type I error rate is inflated). If the distributions of con-
tinuous endogenous variables are severely non-normal, then use an estimation method
that does not assume normality or use corrected statistics (e.g., robust standard errors,
corrected model test statistics) when normal theory methods such as ML estimation
are used. If the distributions are non-normal because the indicators are discrete with
a small number of categories (i.e., they are ordered-categorical variables), then use an
appropriate method for this type of data, such as robust weighted least squares (WLS).
18. Don’t screen for outliers. Even a few extreme scores in a relatively small sample
can distort the results. If it is unclear whether outlier cases are from a different popula-
tion, the analysis can be run with and without these cases in the sample. This strategy
makes clear the effect of outliers on the results. This same strategy can be used to evalu-
ate the effects of different methods to deal with missing data.
19. Assume that all relations are linear. A standard assumption in SEM is that variable
relations are linear. Curvilinear or interactive relations can be represented with product
terms but, in general, such terms must be created by the researcher and then included
in the model. Simple visual scanning of scatterplots can detect bivariate relations that
are obviously curvilinear, but there is no comparably easy visual check for interaction
effects. Model test statistics, including
, are generally insensitive to serious interac-
tion misspecification (i.e., there is real interaction, but the model has no corresponding
product terms that represent these effects).
20. Ignore lack of independence among the scores. This problem may arise in two
contexts. First, the scores are from a repeated measures variable. The ability to specify
a model for the error covariances addresses this first context. The second context refers