respectively. The distance is estimated in the obvious way by replacing probabilities with
observed sample frequencies. The distance has been used in studies of chromosome poly-
morphism as well as in other fields such as anthropology and sociology. [Annual Review of
Anthropology, 1985, 14, 343–73.]
Prewhitening: A term for transformations of
time series
intended to make their spectrum more nearly
that of a
white noise process
. [TMS Chapter 7.]
P rinc ipa l co m pone nts an alysis: A procedure for analysing
multivariate data
which transforms
the original variables into new ones that are uncorrelated and account for decreasing
proportions of the variance in the data. The aim of the method is to reduce the dimensionality
of the data. The new variables, the principal components, are defined as linear functions of
the original variables. If the first few principal components account for a large percentage of
the variance of the observations (say above 70%) they can be used both to simplify
subsequent analyses and to display and summarize the data in a parsimonious manner.
See also factor analysis. [MV1 Chapter 2.]
P rinc ipa l com ponen ts regressi on anal ysi s: A procedure often used to overcome the
problem of
multicollinearity
in regression, when simply deleting a number of the
explanatory variables is not considered appropriate. Essentially the response variable is
regressed on a small number of principal component scores resulting from a
principal
components analysis
of the explanatory variables. [ARA Chapter 9.]
P rinc ipa l coo rd in ates a n alysi s: Synonym for classical scaling.
Principal curve: A smooth, one-dimensional curve that passes through the middle of a q-dimensional
data set; it is nonparametric, and its shape is suggested by the data. [Annals of Statistics,
1996, 24, 1511–20.]
Principal factor analysis: A method of
factor analysis
which is essentially equivalent to a
principal
components analysis
performed on the
reduced covariance matrix
obtained by replacing the
diagonal elements of the sample
variance–covariance matrix
with estimated communalities.
Two frequently used estimates of the latter are (a) the square of the
multiple correlation
coefficient
of the ith variable with all other variables, (b) the largest of the absolute values of
the correlation coefficients between the ith variable and one of the other variables. See also
maximum likelihood factor analysis.[Applied Multivariate Data Analysis, 2nd edition,
2001, B. S. Everitt and G. Dunn, Edward Arnold, London.]
Principal Hessiandirections: A method based on the
Hessian matrix
of a regression function that
can be effective for detecting and visualizing nonlinearities. [Journal of the American
Statistical Association, 1992, 87, 1025–39.]
Principal oscillation pattern analysis (POPS): A method for isolating spatial patterns with a
strong temporal dependence, particularly in the atmospheric sciences. Based on the assump-
tion of a first-order
Markov chain
.[Journal of Climate, 1995, 8, 377–400.]
Principal points: Points
1
;
2
; ...;
k
which minimize the expected squared distance of a p-variate
random variable x from the nearest of the
i
.[Statistics and Computing, 1996, 6, 187–90.]
Principal stratification: A me thod for adjusting for a response variable C that is intermediate on
the causal pathway from a treatment to the final response y. C has potential outcomes C(1)
and C(0), respectively, when a unit is assigned treatment 1 or 0 (see
Neyman -Rubin
causal model
). A common mistake is to treat C
OBS
, the observed value of C,asifitwere
a covariate and conduct analysis stratified on C
OBS
.Thecorrectapproachistostratifyon
the principal strata (C(1),C(0)), latent classes which are unaffected by treatment
337