A Modern Introduction to Probability and Statistics, Understanding Why and How - Dekking, Kraaikamp, Lopuhaa, Meester (Современное введение в теорию вероятностей и статистику

Подождите немного. Документ загружается.

366 24 More on conﬁdence intervals

distribution of T depends on θ, just as that of X

,...,X

does. In some

cases it might be possible to ﬁnd functions g(θ)andh(θ) such that

P(g(θ) <T <h(θ)) = 1 − α for every value of θ. (24.3)

If this is so, then conﬁdence statements about θ can be made. In more

special cases, for example if g and h are strictly increasing, the inequalities

g(θ) <T <h(θ) can be rewritten as

−1

(T ) <θ<g

−1

(T ),

and then (24.3) is equivalent to



−1

(T ) <θ<g

−1

(T )



=1− α for every value of θ.

Checking with the conﬁdence interval deﬁnition, we see that the last state-

ment implies that (h

−1

(t),g

−1

(t)) is a 100(1−α)% conﬁdence interval for θ.

24.3 One-sided conﬁdence intervals

Suppose you are in charge of a power plant that generates and sells electricity,

and you are about to buy a shipment of coal, say a shipment of the Daw Mill

coal identiﬁed as 258GB41 earlier. You plan to buy the shipment if you are

conﬁdent that the gross caloriﬁc content exceeds 31.00 MJ/kg. At the end of

Section 23.2 we obtained for the gross caloriﬁc content the 95% conﬁdence

interval (30.946, 31.067): based on the data we are 95% conﬁdent that the

gross caloriﬁc content is higher than 30.946 and lower than 31.067.

In the present situation, however, we are only interested in the lower bound:

we would prefer a conﬁdence statement of the type “we are 95% conﬁdent

that the gross caloriﬁc content exceeds 31.00.” Modifying equation (23.4) we

ﬁnd



− µ

√

n−1,α



=1−α,

which is equivalent to



− t

n−1,α

√

<µ



=1−α.

We conclude that



¯x

− t

n−1,α

√

, ∞



is a 100(1 − α)% one-sided conﬁdence interval for µ. For the Daw Mill coal,

using α =0.05, with t

21,0.05

=1.721 this results in:



31.012 −1.721

0.1294

√

, ∞



=(30.964, ∞).

24.4 Determining the sample size 367

We see that because “all uncertainty may be put on one side,” the lower

bound in the one-sided interval is higher than that in the two-sided one,

though still below 31.00. Other situations may require a conﬁdence upper

bound. For example, if the caloriﬁc value is below a certain number you can

try to negotiate a lower the price.

The deﬁnition of conﬁdence intervals (page 343) can be extended to include

one-sided conﬁdence intervals as well. If we have a sample statistic L

such

that

P(L

<θ)=γ

for every value of the parameter of interest θ,then

, ∞)

is called a 100γ% one-sided conﬁdence interval for θ.Thenumberl

sometimes called a 100γ% lower conﬁdence bound for θ. Similary, U

with

P(θ<U

)=γ for every value of θ, yields the one-sided conﬁdence interval

(−∞,u

), and u

is called a 100γ% upper conﬁdence bound.

Quick exercise 24.3 Determine the 99% upper conﬁdence bound for the

gross caloriﬁc value of the Daw Mill coal.

24.4 Determining the sample size

The narrower the conﬁdence interval the better (why?). As a general prin-

ciple, we know that more accurate statements can be made if we have more

measurements. Sometimes, an accuracy requirement is set, even before data

are collected, and the corresponding sample size is to be determined. We pro-

vide an example of how to do this and note that this generally can be done,

but the actual computation varies with the type of conﬁdence interval.

Consider the question of the caloriﬁc content of coal once more. We have a

shipment of coal to test and we want to obtain a 95% conﬁdence interval,

but it should not be wider than 0.05 MJ/kg, i.e., the lower and upper bound

should not diﬀer more than 0.05. How many measurements do we need?

We answer this question for the case when ISO method 1928 is used, whence

we may assume that measurements are normally distributed with standard

deviation σ =0.1. When the desired conﬁdence level is 1 − α, the width of

the conﬁdence interval will be

2 · z

α/2

√

Requiring that this is at most w means ﬁnding the smallest n that satisﬁes

α/2

√

≤ w

368 24 More on conﬁdence intervals

n ≥



α/2



For the example: w =0.05, σ =0.1, and z

0.025

=1.96; so

n ≥



2 · 1.96 ·0.1

0.05



=61.4,

that is, we should perform at least 62 measurements.

In case σ is unknown, we somehow have to estimate it, and then the method

can only give an indication of the required sample size. The standard deviation

as we (afterwards) estimate it from the data may turn out to be quite diﬀerent,

and the obtained conﬁdence interval may be smaller or larger than intended.

Quick exercise 24.4 What is the required sample size if we want the 99%

conﬁdence interval to be 0.05 MJ/kg wide?

24.5 Solutions to the quick exercises

24.1 We need to solve



200

− p



−

(2)

200

p(1 − p) < 0, or 1.02 p

− 0.82p +0.16 < 0.

The solutions are:

1,2

−(−0.82) ±



(−0.82)

− 4 · 1.02 · 0.16

2 · 1.02

=0.4020 ± 0.0686,

so the conﬁdence interval is (0.33, 0.47).

24.2 We should substitute n = 15, t =23.5, and α =0.01 into:



t +





,t+



1 −





which yields



23.5 −

5.30

, 23.5 −

0.0050



=(23.1467, 23.4997).

24.3 The upper conﬁdence bound is given by

=¯x

+ t

21,0.01

√

where ¯x

=31.012, t

21,0.01

=2.518, and s

=0.1294. Substitution yields

=31.081.

24.6 Exercises 369

24.4 The conﬁdence level changes to 99%, so we use z

0.005

=2.576 instead

of 1.96 in the computation:

n ≥



2 · 2.576 · 0.1

0.05



= 106.2,

so we need at least 107 measurements.

24.6 Exercises

24.1  Of a series of 100 (independent and identical) chemical experiments,

70 were concluded succesfully. Construct a 90% conﬁdence interval for the

success probability of this type of experiment.

24.2 In January 2002 the Euro was introduced and soon after stories started

to circulate that some of the Euro coins would not be fair coins, because the

“national side” of some coins would be too heavy or too light (see, for example,

the New Scientist of January 4, 2002, but also national newspapers of that

date).

a. A French 1 Euro coin was tossed six times, resulting in 1 heads and 5 tails.

Is it reasonable to use the Wilson method, introduced in Section 24.1, to

construct a conﬁdence interval for p?

b. A Belgian 1 Euro coin was tossed 250 times: 140 heads and 110 tails.

Construct a 95% conﬁdence interval for the probability of getting heads

with this coin.

24.3 In Exercise 23.1, what sample size is needed if we want a 99% conﬁdence

interval for µ at most 1 ml wide?

24.4  Recall Exercise 23.3 and the 10 bags of cement that should each weigh

94 kg. The average weight was 93.5 kg, with sample standard deviation 0.75.

a. Based on these data, how many bags would you need to sample to make

a 90% conﬁdence interval that is 0.1 kg wide?

b. Suppose you actually do measure the required number of bags and con-

struct a new conﬁdence interval. Is it guaranteed to be at most 0.1 kg

wide?

24.5 Suppose we want to make a 95% conﬁdence interval for the probability

of getting heads with a Dutch 1 Euro coin, and it should be at most 0.01

wide. To determine the required sample size, we note that the probability of

getting heads is about 0.5. Furthermore, if X has a Bin (n, p) distribution,

with n large and p ≈ 0.5, then

370 24 More on conﬁdence intervals

X − np



n/4

is approximately standard normal.

a. Use this statement to derive that the width of the 95% conﬁdence interval

for p is approximately

0.025

√

Use this width to determine how large n should be.

b. The coin is thrown the number of times just computed, resulting in 19 477

times heads. Construct the 95% conﬁdence interval and check whether the

required accuracy is attained.

24.6  Environmentalists have taken 16 samples from the wastewater of a

chemical plant and measured the concentration of a certain carcinogenic sub-

stance. They found ¯x

=2.24 (ppm) and s

=1.12, and want to use these

data in a lawsuit against the plant. It may be assumed that the data are a

realization of a normal random sample.

a. Construct the 97.5% one-sided conﬁdence interval that the environmen-

talists made to convince the judge that the concentration exceeds legal

limits.

b. The plant management uses the same data to construct a 97.5% one-

sided conﬁdence interval to show that concentrations are not too high.

Construct this interval as well.

24.7 Consider once more the Rutherford-Geiger data as given in Section 23.4.

Knowing that the number of α-particle emissions during an interval has a

Poisson distribution, we may see the data as observations from a Pois(µ)

distribution. The central limit theorem tells us that the average

of a large

number of independent Pois(µ) approximately has a normal distribution and

hence that

− µ

√

µ/

√

has a distribution that is approximately N(0, 1).

a. Show that the large sample 95% conﬁdence interval contains those values

of µ for which

(¯x

− µ)

≤ (1.96)

b. Use the result from a to construct the large sample 95% conﬁdence interval

based on the Rutherford-Geiger data.

c. Compare the result with that of Exercise 23.9 b. Is this surprising?

24.8  Recall Exercise 23.5 about the 1500 m speed-skating results in the 2002

Winter Olympic Games. If there were no outer lane advantage, the number

24.6 Exercises 371

out of the 23 completed races won by skaters starting in the outer lane would

have a Bin (23,p) distribution with p =1/2, because of the lane assignment

by lottery.

a. Of the 23 races, 15 were won by the skater starting in the outer lane. Use

this information to construct a 95% conﬁdence interval for p by means

of the Wilson method. If you think that n = 23 is probably too small to

use a method based on the central limit theorem, we agree. We should be

careful with conclusions we draw from this conﬁdence interval.

b. The question posed earlier “Is there an outer lane advantage?” implies that

a one-sided conﬁdence interval is more suitable. Construct the appropriate

95% one-sided conﬁdence interval for p by ﬁrst constructing a 90% two-

sided conﬁdence interval.

24.9  Suppose we have a dataset x

,...,x

that may be modeled as the

realization of a random sample X

,...,X

from a U (0,θ) distribution, with

θ unknown. Let M =max{X

,...,X

a. Show that for 0 ≤ t ≤ 1



≤ t



= t

b. Use α =0.1andsolve



≤ c





≤ c



α.

c. Suppose the realization of M is m = 3. Construct the 90% conﬁdence

interval for θ.

d. Derive the general expression for a conﬁdence interval of level 1 −α based

on a sample of size n.

24.10 Suppose we have a dataset x

,...,x

that may be modeled as the

realization of a random sample X

,...,X

from an Exp(λ) distribution, where

λ is unknown. Let S

= X

+ ···+ X

a. Check that λS

has a Gam (n, 1) distribution.

b. The following quantiles of the Gam (20, 1) distribution are given: q

0.05

13.25 and q

0.95

=27.88. Use these to construct a 90% conﬁdence interval

for λ when n = 20.

Testing hypotheses: essentials

The statistical methods that we have discussed until now have been devel-

oped to infer knowledge about certain features of the model distribution that

represent our quantities of interest. These inferences often take the form of

numerical estimates, as either single numbers or conﬁdence intervals. How-

ever, sometimes the conclusion to be drawn is not expressed numerically, but

is concerned with choosing between two conﬂicting theories, or hypotheses.

For instance, one has to assess whether the lifetime of a certain type of ball

bearing deviates or does not deviate from the lifetime guaranteed by the man-

ufacturer of the bearings; an engineer wants to know whether dry drilling is

faster or the same as wet drilling; a gynecologist wants to ﬁnd out whether

smoking aﬀects or does not aﬀect the probability of getting pregnant; the Al-

lied Forces want to know whether the German war production is equal to or

smaller than what Allied intelligence agencies reported. The process of formu-

lating the possible conclusions one can draw from an experiment and choosing

between two alternatives is known as hypothesis testing. In this chapter we

start to explore this statistical methodology.

25.1 Null hypothesis and test statistic

We will introduce the basic concepts of hypothesis testing with an exam-

ple. Let us return to the analysis of German war equipment. During World

War II the Allied Forces received reports by the Allied intelligence agencies

on German war production. The numbers of produced tires, tanks, and other

equipment, as claimed in these reports, were a lot higher than indicated by

the observed serial numbers. The objective was to decide whether the actual

produced quantities were smaller than the ones reported.

For simplicity suppose that we have observed tanks with (recoded) serial num-

bers

61 19 56 24 16.

374 25 Testing hypotheses: essentials

Furthermore, suppose that the Allied intelligence agencies report a production

of 350 tanks.

This is a lot more than we would surmise from the observed

data. We want to choose between the proposition that the total number of

tanks is 350 and the proposition that the total number is smaller than 350.

The two competing propositions are called null hypothesis, denoted by H

,and

alternative hypothesis, denoted by H

. The way we go about choosing between

and H

is conceptually similar to the way a jury deliberates in a court

trial. The null hypothesis corresponds to the position of the defendant: just

as he is presumed to be innocent until proven guilty, so is the null hypothesis

presumed to be true until the data provide convincing evidence against it.

The alternative hypothesis corresponds to the charges brought against the

defendant.

To decide whether H

is false we use a statistical model. As argued in Chap-

ter 20 the (recoded) serial numbers are modeled as a realization of random

variables X

,...,X

representing ﬁve draws without replacement from the

numbers 1, 2,...,N. The parameter N represents the total number of tanks.

The two hypotheses in question are

: N = 350

: N<350.

If we reject the null hypothesis we will accept H

; we speak of rejecting H

in favor of H

. Usually, the alternative hypothesis represents the theory or

belief that we would like to accept if we do reject H

. This means that we

must carefully choose H

in relation with our interests in the problem at hand.

In our example we are particularly interested in whether the number of tanks

is less than 350; so we test the null hypothesis against H

: N<350. If we

wouldbeinterestedinwhetherthenumberoftanksdiﬀers from 350, or is

greater than 350, we would test against H

: N = 350 or H

: N>350.

Quick exercise 25.1 In the drilling example from Sections 15.5 and 16.4 the

data on drill times for dry drilling are modeled as a realization of a random

sample from a distribution with expectation µ

, and similarly the data for wet

drilling correspond to a distribution with expectation µ

.Wewanttoknow

whether dry drilling is faster than wet drilling. To this end we test the null

hypothesis H

: µ

= µ

(the drill time is the same for both methods). What

would you choose for H

The next step is to select a criterion based on X

,...,X

that provides an

indication about whether H

is false. Such a criterion involves a test statistic.

This may seem ridiculous. However, when after the war oﬃcial German produc-

tion statistics became available, the average monthly production of tanks during

the period 1940–1943 was 342. During the war this number was estimated at 327,

whereas Allied intelligence reported 1550! (see [27]).

25.1 Null hypothesis and test statistic 375

Test Statistic. Suppose the dataset is modeled as the realization

of random variables X

,...,X

.Atest statistic is any sample

statistic T = h(X

,...,X

), whose numerical value is used to

decide whether we reject H

In the tank example we use the test statistic

T =max{X

,...,X

Having chosen a test statistic T , we investigate what sort of values T can

attain. These values can be viewed on a credibility scale for H

,andwemust

determine which of these values provide evidence in favor of H

,andwhich

provide evidence in favor of H

. First of all note that if we ﬁnd a value of

T larger than 350, we immediately know that H

as well as H

is false. If

this happens, we actually should be considering another testing problem, but

for the current problem of testing H

: N = 350 against H

: N<350 such

values are irrelevant. Hence the possible values of T that are of interest to us

are the integers from 5 to 350.

If H

is true, then what is a typical value for T and what is not? Remember

from Section 20.1 that, because n = 5, the expectation of T is E[T ]=

(N+1).

This means that the distribution of T is centered around

(N + 1). Hence, if

is true, then typical values of T are in the neighborhood of

·351 = 292.5.

Values of T that deviate a lot from 292.5 are evidence against H

. Values that

are much greater than 292.5 are evidence against H

but provide even stronger

evidence against H

. For such values we will not reject H

in favor of H

.Also

values a little smaller than 292.5 are grounds not to reject H

,becauseweare

committed to giving H

the beneﬁt of the doubt. On the other hand, values

of T very close to 5 should be considered as strong evidence against the null

hypothesis and are in favor of H

, hence they lead to a decision to reject H

This is summarized in Figure 25.1.

5 292.5 350

Values in

favor of H

Values in

favor of H

Values against

both H

and H

Fig. 25.1. Values of the test statistic T .

Quick exercise 25.2 Another possible test statistic would be

.Ifweuse

its values as a credibility scale for H

, then what are the possible values of

,whichvaluesof

are in favor of H

: N<350, and which values are in

favor of H

: N = 350?

376 25 Testing hypotheses: essentials

For the data we ﬁnd

t =max{61, 19, 56, 24, 16} =61

as the realization of the test statistic. How do we use this to decide on H

25.2 Tail probabilities

As we have just seen, if H

is true, then typical values of T are in the neighbor-

hood of

·351 = 292.5. In view of Figure 25.1, the more a value of T is to the

left, the stronger evidence it provides in favor of H

. The value 61 is in the left

region of Figure 25.1. Can we now reject H

and conclude that N is smaller

than 350, or can the fact that we observe 61 as maximum be attributed to

chance? In courtroom terminology: can we reach the conclusion that the null

hypothesis is false beyond reasonable doubt? One way to investigate this is to

examine how likely it is that one would observe a value of T that provides

even stronger evidence against H

than 61, in the situation that N = 350. If

this is very unlikely, then 61 already bears strong evidence against H

Values of T that provide stronger evidence against H

than 61 are to the

left of 61. Therefore we compute P(T ≤ 61). In the situation that N = 350,

the test statistic T is the maximum of 5 numbers drawn without replacement

from 1, 2,...,350. We ﬁnd that

P(T ≤ 61) = P(max{X

,...,X

}≤61)

350

349

···

346

=0.00014.

This probability is so small that we view the value 61 as strong evidence

against the null hypothesis. Indeed, if the null hypothesis would be true, then

values of T that would provide the same or even stronger evidence against H

than 61 are very unlikely to occur, i.e., they occur with probability 0.00014!

In other words, the observed value 61 is exceptionally small in case H

is true.

At this point we can do two things: either we believe that H

is true and

that something very unlikely has happened, or we believe that events with

such a small probability do not happen in practice, so that T ≤ 61 could

only have occurred because H

is false. We choose to believe that things

happening with probability 0.00014 are so exceptional that we reject the null

hypothesis H

: N = 350 in favor of the alternative hypothesis H

: N<350.

In courtroom terminology: we say that a value of T smaller than or equal to

61 implies that the null hypothesis is false beyond reasonable doubt.

P-values

In our example, the more a value of T is to the left, the stronger evidence

it provides against H

. For this reason we computed the left tail probability

A Modern Introduction to Probability and Statistics, Understanding Why and How - Dekking, Kraaikamp, Lopuhaa, Meester (Современное введение в теорию вероятностей и статистику - Как? и Почему? )

Подождите немного. Документ загружается.