# 8.4 - Comparing Two Population Means: Paired Data

Printer-friendly version
 Unit Summary Inferences About the Difference Between Two Population Means for Paired Data The Paired t-Procedure An Example for the Paired t-Test Using Minitab to Perform a Paired t-Test

An Introduction to Statistical Methods and Data Analysis, (see Course Schedule).

### Inferences About the Difference Between Two Population Means for Paired Data

Paired samples: The sample selected from the first population is related to the corresponding sample from the second population.

It is important to distinguish independent samples and paired samples. Some examples are given as follows.

Compare the time that males and females spend watching TV.

Think about the following, then click on the icon to the left to compare your answers.

A. We randomly select 20 males and 20 females and compare the average time they spend watching TV. Is this an independent sample or paired sample?

B. We randomly select 20 couples and compare the time the husbands and wives spend watching TV. Is this an independent sample or paired sample?

The paired t-test will be used when handling hypothesis testing for paired data.

### The Paired t-Procedure

Assumptions:

1. Paired samples
2. The differences of the pairs follow a normal distribution or the number of pairs is large (note here that if the number of pairs is < 30, we need to check whether the differences are normal, but we do not need to check for the normality of each population)

Hypothesis:

$H_0: \mu_d = 0$
$H_a: \mu_d \ne 0$

OR

$H_0: \mu_d = 0$
$H_a: \mu_d < 0$

OR

$H_0: \mu_d = 0$
$H_a: \mu_d > 0$

t-statistic:

Let  d = differences between the pairs of data,  then $\bar{d}$ = mean of these differences.

The test statistics is: $t^{*}=\frac{\bar{d}-0}{{s_d }/\sqrt{n}}$

degrees of freedom = n - 1
where n denotes the number of pairs or the number of differences.

Paired t-interval:

$\bar{d}\pm t_{\alpha/2} \cdot \frac{s_d}{\sqrt{n}}$

Note: $s_{\bar{d}=\frac{s_d}{\sqrt{n}}}$ where $s_{\bar{d}}$ is the standard deviation of the sample differences.

### Example: Drinking Water

Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water (zinc_conc.txt).

Does the data suggest that the true average concentration in the bottom water exceeds that of surface water?

 Location 1 2 3 4 5 6 7 8 9 10 Zincconcentration inbottom water .430 .266 .567 .531 .707 .716 .651 .589 .469 .723 Zincconcentration insurface water .415 .238 .390 .410 .605 .609 .632 .523 .411 .612

To perform a paired t-test for the previous trace metal example:

Assumptions:

1. Is this a paired sample? - Yes.

2. Is this a large sample? - No.

3. Since the sample size is not large enough (less than 30), we need to check whether the differences follow a normal distribution.

In Minitab, we can use Calc > calculator to obtain diff = bottom - surface and then perform a probability plot on the differences.

Thus, we conclude that the difference may come from a normal distribution.

Step 1. Set up the hypotheses:

$H_0: \mu_d = 0$
$H_a: \mu_d > 0$

where 'd' is defined as the difference of bottom - surface.

Step 2. Write down the significance level $\alpha = 0.05$.

Step 3. What is the critical value and the rejection region?

$\alpha = 0.05$, df = 9
$t_{0.05} = 1.833$
rejection region: $t > 1.833$

Step 4. Compute the value of the test statistic:

$t^{*}=\frac{\bar{d}}{\frac{s_d }{\sqrt{n}}}=\frac{0.0804}{\frac{0.0523}{\sqrt{10}}}=4.86$

Step 5. Check whether the test statistic falls in the rejection region and determine whether to reject Ho.

$t^* = 4.86 > 1.833$
reject $H_0$

Step 6. State the conclusion in words.

At $\alpha = 0.05$, we conclude that, on average, the bottom zinc concentration is higher than the surface zinc concentration.

### Using Minitab to Perform a Paired t-Test

You can used a paired t-test in Minitab to perform the test. Alternatively, you can perform a 1-sample t-test on difference = bottom - surface.

1. Stat > Basic Statistics > Paired t

2. Click 'Options' to specify the confidence level for the interval and the alternative hypothesis you want to test.  The default null hypothesis is 0.

The Minitab output for paired T for bottom - surface is as follows:

Paired T for bottom - surface

 N Mean StDev SE Mean bottom 10 0.5649 0.1468 0.0464 surface 10 0.4845 0.1312 0.0415 Difference 10 0.0804 0.0523 0.0165

95% lower bound for mean difference: 0.0505
T-Test of mean difference = 0 (vs > 0): T-Value = 4.86 P-Value = 0.000

Note: In Minitab, if you choose a lower-tailed or an upper-tailed hypothesis test, an upper or lower confidence bound will be constructed, respectively, rather than a confidence interval.

Click on the 'Minitab Movie' icon to display a walk through of 'Conducting a Paired t-Test'.

Using the p-value to draw a conclusion about our example:

p-value = 0.000 < 0.05

Reject $H_0$ and conclude that bottom zinc concentration is higher than surface zinc concentration.

Note: For the zinc concentration problem, if you do not recognize the paired structure, but mistakenly use the 2-sample t-test treating them as independent samples, you will not be able to reject the null hypothesis. This demonstrates the importance of distinguishing the two types of samples. Also, it is wise to design an experiment efficiently whenever possible.

What if the assumption of normality is not satisfied? In this case we would use a nonparametric 1-sample test on the difference.