9.1 MultiStage Sampling: Two Stages with S.R.S at Each Stage
Unit Summary 

We have learned about cluster sampling where one selects the primary units and then all of the cases from the secondary units. With multistage sampling we will only select some of the units from the secondary stages.
For example, in twostage sampling:
 1st stage samples n primary units
 2nd stage, for the ith primary unit, selects m_{i} (not all) secondary units
Multistage designs are used in many practical cases. These are just a few:
 Large surveys involving the sampling of housing units  The U.S. Census Bureau selects geographical areas within each state and then select housing units within each selected geographical area.
 Practical quality control problems often involve two (or more) stages of sampling. For example, Ford wants to inspect the quality of a supplier of air filters. They first sample some cartons and then inspect some air filters inside these selected cartons.
 Gallop poll samples approximately 300 election districts. At the second stage, they select 5 households per district.
Notation:
 N : number of primary units in the population
 M_{i} : number of secondary units in the ith primary unit
 \(y_i=\sum\limits_{j=1}^{M_i}y_{ij}\)
 population total : \(\tau=\sum\limits_{i=1}^N \sum\limits_{j=1}^{M_i}y_{ij}\)
 \(\mu=\dfrac{\tau}{M}\) where \(M=\sum\limits_{i=1}^N M_i\)
 n : number of primary units selected in the first stage
 m_{i} : number of secondary units selected in the second stage
Think About It!
Twostage sampling includes both onestage cluster sampling and stratified random sampling as special cases. When does twostage sampling reduce to cluster sampling? When does twostage sampling reduce to stratified random sampling?
[Come up with an answer to this question and then click on the icon to reveal the solution.]
Multistage Design
This is something that arises in practice quite often. As a result, we need to be able to figure out how this type of sampling design is implemented. Most of the time this deals with two stages of sample with simple random sampling at each stage.
Let's take a look at this graph as a means of understanding how this type of sampling design plays out.
N = 50 for both graphs
(i) . Twostage sample of 10 primary units
and four secondary units per primary unit.
Here is another graph for another example of twostage sample
(ii) Twostage sample of 20 primary units
and two secondary units per primary unit.
Twostage cluster sampling with simple random sampling at each stage
We will discuss two possible estimators for this sampling design: unbiased estimator and ratio estimator.
A. Unbiased Estimator
Since simple random sampling is used in the second stage, an unbiased estimator of the total yvalue for the ith primary unit is:
\(\hat{y}_i=M_i \dfrac{\sum\limits_{j=1}^{m_i}y_{ij}}{m_i}=M_i \bar{y}_i\) where \(\bar{y}_i=\dfrac{\sum\limits_{j=1}^{m_i}y_{ij}}{m_i}\)
The first part of this formula is also known as the expansion estimator.
Also, since simple random sampling is used in the first stage, an unbiased estimator for the population total is:
\(\hat{\tau}=N\cdot \dfrac{\sum\limits_{i=1}^n\hat{y}_i}{n}=N \cdot \dfrac{\sum\limits_{i=1}^n M_i \bar{y}_i}{n}\)
Now we have the expansion estimators from each stage. The next thing we need is the variance.
The estimated variance of \(\hat{\tau}\) is:
\(\hat{V}ar(\hat{\tau})=N(Nn)\dfrac{s^2_u}{n}+\dfrac{N}{n}\sum\limits_{i=1}^n M_i (M_im_i) \dfrac{s^2_i}{m_i}\)
s_{u}^{2} is the sample variance among the primary unit totals,
s_{i}^{2} is the sample variance within the ith primary unit, here
\(s^2_u=\dfrac{1}{n1}\sum\limits_{i=1}^n \left(\hat{y}_i\dfrac{\sum\limits_{i=1}^n \hat{y}_i}{n}\right)^2\), and \(s^2_i=\dfrac{1}{m_i1}\sum\limits_{j=1}^{m_i}(y_{ij}\bar{y}_i)^2\)
To estimate the population mean μ = τ / M, the estimators and the estimated variance are:
\(\hat{\mu}=\dfrac{N}{M}\cdot \dfrac{\sum\limits_{i=1}^n \hat{y}_i}{n}\), and \(\hat{V}ar(\hat{\mu})=\dfrac{1}{M^2}\hat{V}ar(\hat{\tau})\)
Let's take a look at an example where we can compute both the estimates and their variances.
Example  Restaurant Employee Satisfaction
A restaurant chain wants to estimate the average employee satisfaction with their job (the scale is from 1 to 7). They have 120 restaurants the total number of employees in the chain is 6860. They use simple random sampling to sample 10 restaurants. They then use simple random sampling to sample and interview about 20% of the employees in those restaurants,. The data are given as follows.
Restaurant

M_{i}

m_{i}

Employee Satisfaction

\(\bar{y}_i\)

s_{i}

1

54 
10

5, 7, 6, 5, 4, 7, 6, 6, 4, 5  5.50  1.08 
2

48 
10

7, 7, 7, 6, 5, 4, 7, 7, 6, 6  6.20  1.03 
3

68 
14

5, 6, 5, 6, 4, 5, 6, 5, 4, 5, 4, 6, 5, 6  5.14  0.77 
4

70 
14

6, 5, 7, 6, 7, 6, 5, 7, 5, 7, 6, 5, 7, 6  6.07  0.83 
5

52 
10

4, 5, 4, 5, 5, 6, 5, 4, 4, 4  4.60  0.70 
6

62 
12

5, 7, 6, 7, 4, 3, 1, 5, 4, 6, 4, 5  4.75  1.71 
7

41 
8

7, 6, 7, 7, 6, 6, 5, 7  6.38  0.74 
8

53 
11

6, 6, 5, 4, 6, 7, 5, 5, 7, 6, 5  5.64  0.92 
9

64 
12

7, 6, 5, 4, 6, 5, 7, 4, 3, 6, 5, 7  5.42  1.31 
10

43 
9

7, 6, 6, 5, 7, 3, 5, 4, 5  5.33  1.32 
Minitab output:
Here we have output from Minitab that provides the descriptive statistics that you will need to compute the estimators and variance.
Application Exercise
Find the unbiased estimator for the mean employee satisfaction score.
[Come up with an answer to this question and then click on the icon to reveal the solution.]
The estimated variance of the unbiased estimator is then:
\(\hat{V}ar(\hat{\tau})=N(Nn)\dfrac{s^2_u}{n}+\dfrac{N}{n}\sum\limits_{i=1}^n M_i (M_im_i) \dfrac{s^2_i}{m_i}\)
s_{u}^{2} is the sample variance of \(\hat{y}_1,\ \hat{y}_2,\cdots,\ \hat{y}_{10}\). From the Minitab output, s_{u}^{2} = (58.1)^{2} = 3375.61
s_{i}^{2} is the sample variance within the primary unit.
\(s^2_i=\dfrac{1}{m_i1}\sum\limits_{j=1}^{m_i}(y_{ij}\bar{y}_i)^2\)
s_{i} has been computed and given in the table.
Application Exercise
Find the estimated variance of the unbiased estimator for the mean employee satisfaction score.
[Come up with an answer to this question and then click on the icon to reveal the solution.]
Remark: If M is unknown, we cannot use the unbiased estimator \(\hat{\mu}\).
If the cluster total is proportional to the cluster size, then the ratio estimate is appropriate. We will discuss the ratio estimator in the following:
B. Ratio estimator
For the population total, the ratio estimator and its estimated variance are:
\(\hat{\tau}_r=\dfrac{\sum\limits_{i=1}^n \hat{y}_i}{\sum\limits_{i=1}^n M_i}\cdot M=\hat{r}M\)
\(\hat{V}ar(\hat{\tau}_r)=\dfrac{N(Nn)}{n}\cdot \dfrac{1}{n1}\sum\limits_{i=1}^n(\hat{y}_iM_i\hat{r})^2+\dfrac{N}{n}\sum\limits_{i=1}^n M_i(M_im_i)\dfrac{s^2_i}{m_i}\)
A similar question can be asked of the population mean. Therefore, for the population mean, the ratio estimator and its estimated variance are:
\(\hat{\mu}_r=\hat{r}\)
\(\hat{V}ar(\hat{\mu}_r)=\dfrac{1}{M^2}\hat{V}ar(\hat{\tau}_r)\)
Application Exercise
For the example using the Restaurant Employee's Satisfaction data above, find the ratio estimator for the population mean and it estimated variance.
[Come up with an answer to this question and then click on the icon to reveal the solution.]
Remark: If M is unknown, one can use \(\hat{\mu}_r\) and estimate M by:
\(\dfrac{\sum\limits_{i=1}^n M_i}{n}\times N\)
Recall: \(M=\sum\limits_{i=1}^N M_i\)