could wind up with different conclusions. Reporting the significance level at which we are
carrying out the test solves this problem to some degree, but it does not completely remove
the problem.
To provide more information, we can ask the following question: What is the largest
significance level at which we could carry out the test and still fail to reject the null hypoth-
esis? This value is known as the p-value of a test (sometimes called the prob-value).
Compared with choosing a significance level ahead of time and obtaining a critical value,
computing a p-value is somewhat more difficult. But with the advent of quick and
inexpensive computing, p-values are now fairly easy to obtain.
As an illustration, consider the problem of testing H
0
: m 0 in a Normal(m,s
2
) pop-
ulation. Our test statistic in this case is T
nY
¯
/S, and we assume that n is large enough
to treat T as having a standard normal distribution under H
0
. Suppose that the observed
value of T for our sample is t 1.52. (Note how we have skipped the step of choosing a
significance level.) Now that we have seen the value t, we can find the largest significance
level at which we would fail to reject H
0
. This is the significance level associated with
using t as our critical value. Because our test statistic T has a standard normal distribution
under H
0
, we have
p-value P(T 1.52H
0
) 1 (1.52) .065, (C.40)
where () denotes the standard normal cdf. In other words, the p-value in this example
is simply the area to the right of 1.52, the observed value of the test statistic, in a standard
normal distribution. See Figure C.7 for illustration.
Because p-value .065, the largest significance level at which we can carry out this
test and fail to reject is 6.5%. If we carry out the test at a level below 6.5% (such as at
5%), we fail to reject H
0
. If we carry out the test at a level larger than 6.5% (such as 10%),
we reject H
0
. With the p-value at hand, we can carry out the test at any level.
The p-value in this example has another useful interpretation: it is the probability that
we observe a value of T as large as 1.52 when the null hypothesis is true. If the null
hypothesis is actually true, we would observe a value of T as large as 1.52 due to chance
only 6.5% of the time. Whether this is small enough to reject H
0
depends on our tolerance
for a Type I error. The p-value has a similar interpretation in all other cases, as we
will see.
Generally, small p-values are evidence against H
0
, since they indicate that the
outcome of the data occurs with small probability if H
0
is true. In the previous example,
if t had been a larger value, say, t 2.85, then the p-value would be 1
(2.85) .002. This means that, if the null hypothesis were true, we would observe a
value of T as large as 2.85 with probability .002. How do we interpret this? Either we
obtained a very unusual sample or the null hypothesis is false. Unless we have a very small
tolerance for Type I error, we would reject the null hypothesis. On the other hand, a large
p-value is weak evidence against H
0
. If we had gotten t .47 in the previous example,
then p-value 1 (.47) .32. Observing a value of T larger than .47 happens with
probability .32, even when H
0
is true; this is large enough so that there is insufficient doubt
about H
0
, unless we have a very high tolerance for Type I error.
Appendix C Fundamentals of Mathematical Statistics 795