
488 12 Distribution-Free Tests
A better approximation is
P(W
n
≤w) ≈Φ(x) +φ(x)(x
3
−3x)
n
2
1
+n
2
2
+n
1
n
2
+n
20n
1
n
2
(n +1)
,
where
φ(x) and Φ(x) are the PDF and CDF of a standard normal distribu-
tion, respectively, and x
= (w −E(W
n
) +0.5)/
p
Var (W
n
). This approximation is
satisfactory for n
1
>5 and n
2
>5 if there are no ties.
function [W, Z, p] = wsurt( data1, data2, alt )
% --------------------------------------------------------
% WILCOXON SUM RANK TEST
% Input: data1, data2 - first and second sample
% alt - code for alternative hypothesis;
% -1 mu1<m2; 0 mu1 ne m2; and 1 mu1>mu2
% Output: W - sum of the ranks for the first sample. If
% there is no ties, the standardization by ER &
% Var R allows using standard normal quantiles
% as long as sample sizes are larger than 15-20.
% Z - standardized R but adjusted for the ties
% p - p-value for testing equality of distributions
% (equality of locations) against the alternative
% specified by input "alt"
% Example of use:
% > dat1=[1 3 2 4 3 5 5 4 2 3 4 3 1 7 6 6 5 4 5 8 7 3 3 4];
% > dat2=[2 5 4 3 4 3 2 2 1 2 3 2 3 4 3 2 3 4 4 3 5];
% > [sumranks1, tstat, pval] = wsurt(dat1, dat2, 1)
%
% Needs: M-FILE ranks.m (ranking procedure)
%-----------------------------------------------------------
data1 = data1(:)’ ; %convert sample 1 to a row vector
n1 = length( data1 ); %n1 - size of first sample, data1
data2 = data2(:)’ ; %convert sample 2 to a row
n2 = length( data2 ); %n2 - size of second sample, data2
n =n1+ n2; %n is the total sample size
mergeboth = [ data1 data2 ];
ranksall = ranks( mergeboth ); %ranks of merged observations
W2 = sum( ( ranksall.^2 ) ); %sum of all ranks squared
% needed to make adjustment for the ties; if no ties are
% present, this sum is equal to the sum of squares of the
% first n integers: n(n+1)(2 n+1)/6.
ranksdata1 = ranksall( :, 1:n1); %ranks of first sample
W = sum( ranksdata1 ); % statistic for WMW
%--------------------------------------------------------
Z = (W - n1
*
(n+1)/2 )/sqrt( n1
*
n2
*
W2/(n
*
(n-1)) ...
- n1
*
n2
*
(n+1)^2/(4
*
(n-1)));
% Z is approximately standard normal and approximation is
% quite good if n1,n2 > 15. Since W ranges over integers
% and half integers, a continuity correction, cc, may be
% used for improving the accuracy of p-values.
cc = 0.25/sqrt( n1
*
n2
*
W2/(n
*
(n-1)) - ...
n1
*
n2
*
(n+1)^2/(4
*
(n-1)));