Estimating a Proportion for a Large Population

Printer-friendly versionPrinter-friendly version

ballotExample

A pollster wants to estimate p, the true proportion of all Americans favoring the Democratic candidate with 95% confidence and error ε no larger than 0.03.

How many people should he randomly sample to achieve his goals?

Solution. We'll tackle this problem just as we did for finding the sample size necessary to estimate a population mean. First, note that the pollster's goal is to estimate the population proportion p so that the error is no larger than 0.03. That is, the goal is to calculate a 95% confidence interval such that:

\(\hat{p}\pm \epsilon=\hat{p}\pm 0.03\)

But, we know the formula for a (1−α)100% confidence interval for a population proportion is:

\(\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

So, just as we did on the previous page, we'll proceed by equating the terms appearing after each of the above ± signs, and solve for n. That is, equate:

\(\epsilon=z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

and solve for n. Multiplying through by the square root of n, we get:

\(\epsilon \sqrt{n}=z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})}\)

And, dividing through by ε and squaring both sides, we get:

\(n=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{\epsilon^2}\)

Again, before we make the calculation for our particular example, let's take a step back and summarize the formula that we have just derived.

Definition. The sample size necessary for estimating a population proportion p of a large population with (1−α)100% confidence and error no larger than ε is:

\(n=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{\epsilon^2}\)

Just as we needed to have a decent estimate, s2, of the population variance when calculating the sample size necessary for estimating a population mean μ, we need to have a good estimate, eqn, of the population proportion when calculating the sample size necessary for estimating a population proportion p. Strange, I know... but there are at least two ways out of this conundrum. 

Ways to Determine \(\hat{p}(1-\hat{p})\)

(1) You can use your prior knowledge (previous polls, perhaps?) about \(\hat{p}\).

(2) You can set  \(\hat{p}(1-\hat{p})=\dfrac{1}{4}\) , its maximum when \(\hat{p}=\dfrac{1}{2}\)

drawing

votingExample (continued)

A pollster wants to estimate p, the true proportion of all Americans favoring the Democratic candidate with 95% confidence and error ε no larger than 0.03.

How many people should he randomly sample to achieve his goals?

Solution. If the maximum error ε is 0.03, and the sample proportion is 0.8, we need to survey:

\(n=\dfrac{(1.96)^2(0.8)(0.2)}{0.03^2}=682.95\)

or 683 people to estimate p with 95% confidence. Again, when making sample size calculations such at this one, it is a good idea to change all of the factors to see what the "cost" is in sample size for achieving certain errors ε and confidence levels (1−α). Doing that here, we get:

table

We, of course, can also change the sample proportion. For example, if we change the sample proportion to 0.5, then we need to survey:

\(n=\dfrac{(1.96)^2(0.5)(0.5)}{0.03^2}=1067.1\)

or 1068 people to estimate p with 95% confidence. The two calculations in this example illustrate how useful it is to have some idea of the magnitude of the sample proportion. In one case, if the proportion is close to 0.80, then we'd need as few as 680 people.  On the other hand, if the proportion is close to 0.50, then we'd need as many as 1070 people. That difference in necessary sample size sure argues for a small pilot study in advance of the larger survey.

By the way, just as we did for the case in which the sample proportion was 0.8, we can change the factors to see what the "cost" is in sample size for achieving certain errors ε and confidence levels (1−α). Doing that here, we get:

table