218 CHAPTER 15
Statpacks
Regression analysis is hardly ever performed any more except by computer.
Different statpacks use a variety of vocabularies to talk about it, in part because
linear regression is only the tip of the iceberg. Regression analysis is really a
whole family of analytical approaches involving curved line fitting in addition
to straight line fitting and incorporating a number of variables simultaneously
instead of just two. Any very large and powerful statpack will perform many
of these other kinds of analysis as well, and the simple, but powerful, linear
regression techniques discussed here may be embedded in this broader family
of analyses. Consequently, the commands or menu selections that produce a
simple linear regression vary substantially from one statpack to another and
are often much more complicated than it seems like they need to be. Recourse
to the manual or help system for your particular program is likely to be nec-
essary. Some statpacks integrate scatter plots into the procedures that perform
regression analysis as an option, while others perform the numerical analysis
as one operation and produce scatter plots as a different operation. Usually the
inclusion of the curves delimiting a confidence region for the best-fit straight
line is an option to be specified as part of the production of a scatterplot. Resid-
uals, of course, are calculated as part of the regression analysis, but to be able
to use them as a new measurement and pursue further analysis with them it is
usually necessary to save them by specifying this as an option to the regression
analysis. Typically this results in the creation of a new data file in the normal
format your statpack uses for data files. The new file will have the same cases
as the original data file and a variable whose values are the residuals from the
regression analysis.
Second, oval shapes of points with very thin sections (or even worse, two or more
separate oval clouds) are the equivalent of multipeaked shapes for single batches
of numbers. They can create the same kinds of problems in linear regression that
outliers do. Fig.
15.10 shows another extreme example, where two ovals of points
showing negative correlations of some strength turn into a single best-fit straight line
with a positive slope when improperly analyzed together. Such a shape may occur
in a scatter plot of two variables that, when looked at individually, have clearly
single-peaked and symmetrical shapes. Shapes like this should be broken apart for
separate analysis.
Third, tendencies toward curved patterns in the oval of points can prevent a very
good fit of a straight line to a fundamentally linear pattern that just happens to
be curved. There are ways to extend the logic of linear regression to more com-
plex curvilinear relationships between variables, but it is usually much easier to
straighten out the curve by transforming one or both variables. The kinds of trans-
formations required are very like the transformations discussed in Chapter
5 and
may be applied to either or both of the variables to remove tendencies toward cur-