8.4  Comparing Two Population Means: Paired Data
Unit Summary 

Reading Assignment
An Introduction to Statistical Methods and Data Analysis, (see Course Schedule).
Inferences About the Difference Between Two Population Means for Paired Data
Paired samples: The sample selected from the first population is related to the corresponding sample from the second population.
It is important to distinguish independent samples and paired samples. Some examples are given as follows.
Compare the time that males and females spend watching TV.
Think about the following, then click on the icon to the left to compare your answers.
A. We randomly select 20 males and 20 females and compare the average time they spend watching TV. Is this an independent sample or paired sample?
B. We randomly select 20 couples and compare the time the husbands and wives spend watching TV. Is this an independent sample or paired sample?
The paired ttest will be used when handling hypothesis testing for paired data.
The Paired tProcedure
Assumptions:
 Paired samples
 The differences of the pairs follow a normal distribution or the number of pairs is large (note here that if the number of pairs is < 30, we need to check whether the differences are normal, but we do not need to check for the normality of each population)
Hypothesis:
\(H_0: \mu_d = 0\)
\(H_a: \mu_d \ne 0\)OR
\(H_0: \mu_d = 0\)
\(H_a: \mu_d < 0\)OR
\(H_0: \mu_d = 0\)
\(H_a: \mu_d > 0\)tstatistic:
Let d = differences between the pairs of data, then \(\bar{d}\) = mean of these differences.
The test statistics is: \(t^{*}=\frac{\bar{d}0}{{s_d }/\sqrt{n}}\)
degrees of freedom = n  1
where n denotes the number of pairs or the number of differences.Paired tinterval:
\[\bar{d}\pm t_{\alpha/2} \cdot \frac{s_d}{\sqrt{n}}\]
Note: \(s_{\bar{d}=\frac{s_d}{\sqrt{n}}}\) where \(s_{\bar{d}}\) is the standard deviation of the sample differences.
Example: Drinking Water
Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water (zinc_conc.txt).
Does the data suggest that the true average concentration in the bottom water exceeds that of surface water?
Location


1

2

3

4

5

6

7

8

9

10


Zinc concentration in bottom water 
.430

.266

.567

.531

.707

.716

.651

.589

.469

.723

Zinc concentration in surface water 
.415

.238

.390

.410

.605

.609

.632

.523

.411

.612

To perform a paired ttest for the previous trace metal example:
Assumptions:
1. Is this a paired sample?  Yes.
2. Is this a large sample?  No.
3. Since the sample size is not large enough (less than 30), we need to check whether the differences follow a normal distribution.
In Minitab, we can use Calc > calculator to obtain diff = bottom  surface and then perform a probability plot on the differences.
Thus, we conclude that the difference may come from a normal distribution.
Step 1. Set up the hypotheses:
\(H_0: \mu_d = 0\)
\(H_a: \mu_d > 0\)where 'd' is defined as the difference of bottom  surface.
Step 2. Write down the significance level \(\alpha = 0.05\).
Step 3. What is the critical value and the rejection region?
\(\alpha = 0.05\), df = 9
\(t_{0.05} = 1.833\)
rejection region: \( t > 1.833\)
Step 4. Compute the value of the test statistic:
\[t^{*}=\frac{\bar{d}}{\frac{s_d }{\sqrt{n}}}=\frac{0.0804}{\frac{0.0523}{\sqrt{10}}}=4.86\]
Step 5. Check whether the test statistic falls in the rejection region and determine whether to reject H_{o}.
\(t^* = 4.86 > 1.833\)
reject \(H_0\)
Step 6. State the conclusion in words.
At \(\alpha = 0.05\), we conclude that, on average, the bottom zinc concentration is higher than the surface zinc concentration.
Using Minitab to Perform a Paired tTest
You can used a paired ttest in Minitab to perform the test. Alternatively, you can perform a 1sample ttest on difference = bottom  surface.
1. Stat > Basic Statistics > Paired t
2. Click 'Options' to specify the confidence level for the interval and the alternative hypothesis you want to test. The default null hypothesis is 0.
The Minitab output for paired T for bottom  surface is as follows:
Paired T for bottom  surface
N Mean StDev SE Meanbottom 10 0.5649 0.1468 0.0464surface 10 0.4845 0.1312 0.0415Difference 10 0.0804 0.0523 0.016595% lower bound for mean difference: 0.0505
TTest of mean difference = 0 (vs > 0): TValue = 4.86 PValue = 0.000Note: In Minitab, if you choose a lowertailed or an uppertailed hypothesis test, an upper or lower confidence bound will be constructed, respectively, rather than a confidence interval.
Click on the 'Minitab Movie' icon to display a walk through of 'Conducting a Paired tTest'.
Using the pvalue to draw a conclusion about our example:
pvalue = 0.000 < 0.05
Reject \(H_0\) and conclude that bottom zinc concentration is higher than surface zinc concentration.
Note: For the zinc concentration problem, if you do not recognize the paired structure, but mistakenly use the 2sample ttest treating them as independent samples, you will not be able to reject the null hypothesis. This demonstrates the importance of distinguishing the two types of samples. Also, it is wise to design an experiment efficiently whenever possible.
What if the assumption of normality is not satisfied? In this case we would use a nonparametric 1sample test on the difference.