Vidakovic B. Statistics for Bioengineering Sciences: With Matlab and WinBugs Support

Подождите немного. Документ загружается.

12.2 Sign Test 479

number of + is equal to

i=1

1(X

> m

)

and has a binomial distribution with parameters n and 1/2.

Let the level of the test,

α, be speciﬁed. When the alternative is H

: med >

, the critical values of T are integers greater than or equal to k

, which is

deﬁned as the smallest integer for which the relationship

t=k

<α

holds.

Likewise, if the alternative is H

: med < m

, the critical values of T are

integers less than or equal to k

, which is deﬁned as the largest integer for

which the relationship

t=0

<α

holds.

If the alternative is two-sided, i.e., H

: med 6= m

, the critical values of T

are integers less than or equal to k

α/2

and greater than or equal to k

α/2

, which

are deﬁned, respectively, as the largest and smallest integers for which the

inequalities

α/2

t=0

<α/2, and

t=k

α/2

<α/2

hold.

If the value T is observed, then in testing against the alternative H

med

> m

, large values of T are critical and the p-value is p =

−n

n−T

−n

. When testing against the alternative H

: med < m

, small val-

ues of T are critical and the p-value is p

−n

. When the alternative

is two-sided, the p-value is p

i=0

−n

for T

=min{T, n −T}.

Alternative p-value

: med > m

−n

: med 6= m

i=0

−n

for T

=min{T, n −T}

: med < m

−n

480 12 Distribution-Free Tests

Consider now the two-paired-sample case, and suppose that the samples

,... , X

and Y

,... , Y

are observed. We are interested in knowing whether

the population median of the differences X

−Y

, X

−Y

,... , X

−Y

is equal

to 0. In this case, T

1(X

> Y

), is the total number of strictly positive

differences.

Although it is true that the hypothesis of equality of means is equivalent

to the hypothesis that the mean of the differences is 0, for the medians an

analogous statement is not true in general. More precisely, if D

= X −Y , then

med(D) may not be equal to med(X )

−med(Y ). Thus, with the sign test we are

not testing the equality of medians, but whether the median of the differences

is 0.

Ties complicate the calculations but can be handled. Even when observa-

tions come from a continuous distribution, ties appear due to limited precision

in the application part. There are several ways of dealing with ties:

(i) Ignore them. If there are s ties, use only the “untied” observations. Of

course, the sample size drops to n

−s.

(ii) Assign the winning sign to tied pairs. For example, if there are two

minuses, two ties, and six pluses, consider the two ties as pluses.

(iii) Randomize. If you have two ties, ﬂip a coin twice and assign a plus if

the coin lands heads and minus if the coin lands tails.

In script

signtst.m conducting the sign test (not to be mixed with MAT-

LAB’s built in

signtest), the options for handling ties are: I, C, and R, for poli-

cies in (i)-(iii).

Example 12.1. TCDD Levels. Many Vietnam veterans have dangerously

high levels of the dioxin 2,3,7,8-TCDD in their blood and fat tissue as a result

of their exposure to the defoliant Agent Orange. A study published in Chemo-

sphere (vol. 20, 1990) reported on the TCDD levels of 20 Massachusetts Viet-

nam veterans who had possibly been exposed to Agent Orange. The amounts

of TCDD (measured in parts per trillion) in blood plasma and fat tissue drawn

from each veteran are shown in the table below.

TCDD levels in plasma TCDD levels in fat tissue

2.5 3.1 2.1 4.9 5.9 4.4

3.5 3.1 1.8

6.9 7.0 4.2

6.8 3.0 36.0

10.0 5.5 41.0

4.7 6.9 3.3

4.4 7.0 2.9

4.6 1.6 7.2

4.6 1.4 7.7

1.8 20.0 2.0

1.1 11.0 2.5

2.5 4.1

2.3 2.5

Is there sufﬁcient evidence of a difference between the distributions of TCDD

levels in plasma and fat tissue for Vietnam veterans exposed to Agent Orange?

Use the sign test and

α =0.10.

12.3 Ranks 481

tcddpla = [2.5 3.1 2.1 3.5 3.1 1.8 6.8 3.0 36.0 ...

4.7 6.9 3.3 4.6 1.6 7.2 1.8 20.0 2.0 2.5 4.1];

tcddfat = [4.9 5.9 4.4 3.5 7.0 4.2 10.0 5.5 41.0 ...

4.4 7.0 2.9 4.6 1.4 7.7 1.8 11.0 2.5 2.3 2.5];

% ignore ties

[pvae, pvaa, n, plusses, ties] = signtst(tcddpla, tcddfat)

%pvae =0.1662

%pvaa =0.1660

%n =17

%plusses =6

%ties =3

% randomize ties

[pvae, pvaa, n, plusses, ties] = signtst(tcddpla, tcddfat,’R’)

% pvae =0.2517

% take the conservative, least favorable approach

[pvae, pvaa, n, plusses, ties] = signtst(tcddpla, tcddfat,’C’)

% pvae =0.0577

Overall, the sign test failed to ﬁnd signiﬁcant differences between the dis-

tributions of TCDD levels. Only the conservative assignment of ties (least fa-

vorable to H

) produced p-value of 0.0577, signiﬁcant at α = 10% level. Com-

pare these results with MATLAB’s built-in function

signtest.

12.3 Ranks

Let X

, X

,... , X

be a sample from a population with a continuous distri-

bution F. Many distribution-free procedures are based on how observations

within the sample are ranked compared to either a parameter

θ or to another

sample. The ranks of a sample X

, X

,... , X

are deﬁned as indices of ordered

sample

r(X

), r(X

),... , r(X

For example,

ranks([10 20 25 7])

%ans = 2 3 4 1

The function ranks.m is

function r = ranks(data, glob)

%--------------------------------------

if nargin < 2

glob = 1;

end

482 12 Distribution-Free Tests

shape = size(data);

if glob == 1

data=data(:);

end

% Ties ranked from UptoDown

[ irrelevant , indud ] = sort(data);

[ irrelevant , rUD ] = sort(indud);

% Ties ranked from RtoL

[ irrelevant , inddu ] = sort(flipud(data));

[ irrelevant , rDU ] = sort(inddu);

% Averages ranks of ties, keeping ranks

% of no-tie-observations the same

r = (rUD + flipud(rDU))./2;

r = reshape(r,shape);

For example, when the input is a matrix, the optional parameter glob = 1

produces global ranking, while for glob not equal to 1, columnwise ranking is

performed.

%a =

% 0.8147 0.9134 0.2785 0.9649

% 0.9058 0.6324 0.5469 0.1576

% 0.1270 0.0975 0.9575 0.9706

ranks(a)

% ans =

% 7 9 4 11

% 8 6 5 3

% 2 1 10 12

ranks(a,2)

% ans =

% 2 3 1 2

% 3 2 2 1

% 1 1 3 3

In the case of ties, it is customary to average the tied rank values. The

script

ranks.m does just that:

ranks([2 1 7 1 15 9])

%ans = 3.0000 1.5000 4.0000 1.5000 6.0000 5.0000

Here r(2) = 3, r(1) = 1.5, r(7) =4, and so on. Note that 1 appears twice and

ranks 1 and 2 are averaged. In the case

ranks([9 1 7 1 9 9])

%ans = 5.0000 1.5000 3.0000 1.5000 5.0000 5.0000

the ranks of three 9s are 4, 5, and 6, which are averaged to 5.

Suppose that a random sample from continuous distribution X

,... , X

ranked and that R

= r(X

), i = 1,. .., n are the ranks. Ranks R

are random

variables with discrete uniform distribution (p. 141). The properties of integer

sums lead to the following properties for ranks:

12.4 Wilcoxon Signed-Rank Test 483

E(R

) =

j=1

n +1

E(R

) =

j=1

n(n +1)(2n +1)

(n +1)(2n +1)

Var (R

) =

−1

These relationships follow from the fact that for a random sample, ranks are

distributed as discrete uniform, namely, for any i,

P(R

= j) =

, 1

≤ j ≤n.

12.4 Wilcoxon Signed-Rank Test

More powerful than the sign test is Wilcoxon’s signed-rank test (Wilcoxon,

1945), where, in addition to signs, particular ranks are taken into account.

Let the paired sample (X

), i = 1,..., n be observed and let D

= X

−

, i =1,..., n be the differences. In a two-sample problem, we are interested

in testing that the true mean of the differences is 0.

It is also possible to consider a one-sample scenario in which testing the

hypothesis about the median med is of interest. Here H

: med = m

is tested

versus the one- or two-sided alternative. Then observations X

, i =1, . .., n are

compared to m

, and the differences are D

= X

−m

, i =1, . .., n.

The only assumption is that the distribution of the differences D

, i =

1,... , n is symmetric about 0. This implies that positive and negative differ-

ences are equally likely. For this test, the absolute values of the differences

(

|,|D

|,... , |D

|) are ranked. Let r(|D

|), r(|D

|),..., r(|D

|) be the ranks of

the differences.

Under H

, the expectations of the sum of positive differences and the sum

of negative differences should be equal. Deﬁne

i=1

r(|D

and

−

i=1

(1 −S

) r(|D

|),

484 12 Distribution-Free Tests

where S

= 1 if D

> 0 and S

= 0 if D

< 0. Cases where D

= 0 are ties and

are ignored. Thus, W

−

is the sum of all ranks, and in the case of no ties

is equal to

i = n(n +1)/2. The statistic for the WSiRT is the difference

between the ranks of positive differences and ranks of negative differences:

−W

−

i=1

r(|D

|)S

−n(n +1)/2.

Rule: For the WSiRT, it is suggested that a large-sample approximation

should be used for W. If the samples are from the same population, the differ-

ences should be well mixed, and the sum of the ranks of positive differences

should be close to the sum of the ranks of negative differences. Thus, in this

case,

E(W) = 0 and Var (W) =

(r(|D

) =

= n(n +1)(2n +1)/6 under H

and no ties in differences. The statistic

Var (W)

has an approximately standard normal distribution, so

normcdf can be used to

evaluate the p-values of the observed statistic W with respect to a particular

alternative (see the m-ﬁle

wsirt.m).

function [W, Z, p] = wsirt( data1, data2, alt )

% -----------------------------------------------------

% WILCOXON SIGNED RANK TEST

% Input: data1, data2 - first and second sample

% alt - code for alternative hypothesis;

% -1 mu1<mu2; 0 mu1 ne mu2; and 1 mu1>mu2

% Output: W - sum of all signed ranks

% Z - standardized W but adjusted for the ties

% p - p-value for testing equality of distribs

% (equality of locations) against the

% alternative specified by the input alt

% Example of use:

% > dat1=[1 3 2 4 3 5 5 4 2 3 4 3 1 7 6 6 5 4 5 8 7];

% > dat2=[2 5 4 3 4 3 2 2 1 2 3 2 3 4 3 2 3 4 4 3 5];

% > [srs, tstat, pval] = wsirt(dat1, dat2, 1)

% Needs: M-FILE ranks.m (ranking procedure)

%------------------------------------------------------

data1 = data1(:)’ ; % convert sample 1 to a row vector

data2 = data2(:)’ ; % convert sample 2 to a row vector

if length(data1) ~= length(data2)

error(’Sample sizes should coincide’)

end

difs = data1 - data2;

difs = difs( difs ~= 0); % exclude ties

rank

all = ranks(abs(difs));

signs = 2.

(difs > 0)-1;

sig

ranks = signs .

rank

all;

W = sum( sig

ranks ); %sum of all signed ranks

12.4 Wilcoxon Signed-Rank Test 485

W2 = sum( ( sig

ranks.^2 ) );

Z = W/sqrt(W2);

cc = 1/sqrt(W2); %continuity correction

%-------------------------- alternatives -------------

% alt == 0 for two sided;

% alt == -1 for mu1 < mu2; alt == 1 for mu1 > mu2.

if alt == 0

p = 2

normcdf(-abs( Z ) + cc);

elseif alt == -1

p = normcdf( Z + cc);

elseif alt == 1

p = normcdf( -Z + cc);

else

error(’Input "alt" should be either 0,-1,or 1.’)

end

Example 12.2. Identical Twins. This data set was discussed in Conover

(1999). Twelve pairs of identical twins underwent psychological tests to mea-

sure the amount of aggressiveness in each person’s personality. We are inter-

ested in comparing the twins to each other to see if the ﬁrst-born twin tends to

be more aggressive than the other. The results are as follows (the higher score

indicates more aggressiveness).

First-born, X

86 71 77 68 91 72 77 91 70 71 88 87

Second twin, Y

88 77 76 64 96 72 65 90 65 80 81 72

The hypotheses are: H

: the mean aggressiveness scores for the two twins

are the same, that is,

E(X

) = E(Y

), and H

: the ﬁrst-born twin tends to be

more aggressive than the other, i.e.,

E(X

) > E(Y

). The WSiRT is appropriate

if we assume that D

= X

−Y

are independent and symmetric. Below is the

output of

wsirt, where the T statistic has been used.

fb = [86 71 77 68 91 72 77 91 70 71 88 87];

sb = [88 77 76 64 96 72 65 90 65 80 81 72];

[w1, z1, p] = wsirt(fb, sb, 1)

%w1 = 17 %value of T

%z1 = 0.7565 %value of Z

%p = 0.2382 %p-value of the test

Note that the test failed to reject H

The WSiRT can be used to test the hypothesis of location H

: µ =µ

using

a single sample, as in a one-sample t-test. The differences in the WSiRT are

−µ

, X

−µ

,... , X

−µ

instead of X

−Y

, X

−Y

,... , X

−Y

, as in the

two-sample WSiRT.

Example 12.3. In the Moon Illusion (Example 9.4), we tested H

: µ =1 against

: µ >1.

486 12 Distribution-Free Tests

moon = [1.73 1.06 2.03 1.40 0.95 1.13 1.41 1.73 1.63 1.56];

mu0 = 1;

mu0vec = mu0

ones(size(moon));

[w, z, p]=wsirt(moon, mu0vec, 1)

%w = 53

%z = 2.7029

%p = 0.0040

Compared to the t-test where the p-value was found to be pval = 9.9885e-04,

the WSiRT still rejects H

even though the p-value is higher, p = 0.004.

Equivalently, the WSiRT can be based on the sum of the ranks of posi-

tive differences only (or, equivalently, the sum of the ranks of negative dif-

ferences only). In that case, under H

, EW

= n(n +1)/4 and Var (W

) =

n(n +1)(2n +1)/24, leading to

−n(n +1)/4

n(n +1)(2n +1)/24

12.5 Wilcoxon Sum Rank Test and

Wilcoxon–Mann–Whitney Test

The Wilcoxon sum rank test (WSuRT) and Wilcoxon–Mann–Whitney test

(WMW) are equivalent tests and we will discuss only the former, WSuRT. The

tests are named after statisticians shown in Fig. 12.1a-c.

(a) (b) (c)

Fig. 12.1 (a) Frank Wilcoxon (1892–1965), (b) Henry Berthold Mann (1905–2000), and (c)

Donald Ransom Whitney (1915–2001).

The WSuRT is often used in place of a two-sample t-test when the popula-

tions being compared are independent, but possibly not normally distributed.

12.5 Wilcoxon Sum Rank Test and Wilcoxon–Mann–Whitney Test 487

An example of the sort of data for which this test could be used is responses

on a Likert scale (e.g., 1 = much worse, 2 = worse, 3 = no change, 4 = better,

5 = much better). It would be inappropriate to use the t-test for such data be-

cause for ordinal data the normality assumption does not hold. The WSuRT

tells us more generally whether the groups are homogeneous or if one group is

“better’ than the other. More generally, the basic null hypothesis of the WSuRT

is that the two populations are equal. That is, H

: F

(x) = F

(x). When stated

in this way, this test assumes that the shapes of the distributions are similar,

which is not a stringent assumption.

Let X

= X

,... , X

and Y = Y

,... , Y

be two samples of sizes n

and n

respectively, from the populations that we want to compare. Assume that the

samples are put together and that n

= n

+ n

ranks are assigned to their

concatenation. The test statistic W

is the sum of ranks (1 to n) corresponding

to the ﬁrst sample, X. For example, if X

= 1, X

= 13, X

= 7, X

= 9, and

=2,Y

=0,Y

=18, then the value of W

is 2 +4 +5 +6 =17.

If the two populations have the same distribution, then the sum of the

ranks of the ﬁrst sample and those in the second sample should be close, rela-

tive to their sample sizes. The WSuRT statistic is

i=1

(X ,Y ),

where

(

) is an indicator function deﬁned as 1 if the

th ranked obser-

vation is from the ﬁrst sample and as 0 if the observation is from the second

sample.

For example, for X

= 1, X

= 13, X

= 7, X

= 9 and Y

= 2,Y

= 0,Y

= 18,

=0, S

=1, S

=0, S

=1, S

=0. Thus

=1 ×0 +2 ×1 +3 ×0 +4 ×1 +5 ×1 +6 ×1 +7 ×0 =2 +4 +5 +6 =17.

If there are no ties, then under H

E(W

) =

(n +1)

and

Var (W

) =

(n +1)

The statistic W

achieves its minimum when the ﬁrst sample is entirely

smaller than the second, and its maximum when the opposite occurs:

minW

i=1

i =

+1)

, maxW

i=n−n

i =

(2n −n

+1)

For the statistic W

a normal approximation holds:

∼N

(n +1)

488 12 Distribution-Free Tests

A better approximation is

P(W

≤w) ≈Φ(x) +φ(x)(x

−3x)

20n

(n +1)

where

φ(x) and Φ(x) are the PDF and CDF of a standard normal distribu-

tion, respectively, and x

= (w −E(W

) +0.5)/

Var (W

). This approximation is

satisfactory for n

>5 and n

>5 if there are no ties.

function [W, Z, p] = wsurt( data1, data2, alt )

% --------------------------------------------------------

% WILCOXON SUM RANK TEST

% Input: data1, data2 - first and second sample

% alt - code for alternative hypothesis;

% -1 mu1<m2; 0 mu1 ne m2; and 1 mu1>mu2

% Output: W - sum of the ranks for the first sample. If

% there is no ties, the standardization by ER &

% Var R allows using standard normal quantiles

% as long as sample sizes are larger than 15-20.

% Z - standardized R but adjusted for the ties

% p - p-value for testing equality of distributions

% (equality of locations) against the alternative

% specified by input "alt"

% Example of use:

% > dat1=[1 3 2 4 3 5 5 4 2 3 4 3 1 7 6 6 5 4 5 8 7 3 3 4];

% > dat2=[2 5 4 3 4 3 2 2 1 2 3 2 3 4 3 2 3 4 4 3 5];

% > [sumranks1, tstat, pval] = wsurt(dat1, dat2, 1)

% Needs: M-FILE ranks.m (ranking procedure)

%-----------------------------------------------------------

data1 = data1(:)’ ; %convert sample 1 to a row vector

n1 = length( data1 ); %n1 - size of first sample, data1

data2 = data2(:)’ ; %convert sample 2 to a row

n2 = length( data2 ); %n2 - size of second sample, data2

n =n1+ n2; %n is the total sample size

mergeboth = [ data1 data2 ];

ranksall = ranks( mergeboth ); %ranks of merged observations

W2 = sum( ( ranksall.^2 ) ); %sum of all ranks squared

% needed to make adjustment for the ties; if no ties are

% present, this sum is equal to the sum of squares of the

% first n integers: n(n+1)(2 n+1)/6.

ranksdata1 = ranksall( :, 1:n1); %ranks of first sample

W = sum( ranksdata1 ); % statistic for WMW

%--------------------------------------------------------

Z = (W - n1

(n+1)/2 )/sqrt( n1

W2/(n

(n-1)) ...

- n1

(n+1)^2/(4

(n-1)));

% Z is approximately standard normal and approximation is

% quite good if n1,n2 > 15. Since W ranges over integers

% and half integers, a continuity correction, cc, may be

% used for improving the accuracy of p-values.

cc = 0.25/sqrt( n1

W2/(n

(n-1)) - ...

(n+1)^2/(4

(n-1)));