# 7.3 - Comparing Two Independent Means - Unpooled and Pooled

We determine whether to apply "pooled" or "unpooled" procedures by comparing the sample standard deviations. RULE OF THUMB: If the larger sample standard deviation is MORE THAN twice the smaller sample standard deviation then perform the analysis using unpooled methods.

#### Example 1 (Unpooled):

Cholesterol levels are measured for 28 heart attack patients (2 days after their attacks) and 30 other hospital patients who did not have a heart attack. The response is quantitative so we compare means. It is thought that cholesterol levels will be higher for the heart attack patients, so a one-sided alternative hypothesis is used.

**Step 1: **null is *H*_{0 }: μ_{1 }− μ_{2} = 0 and alternative is *H*_{a }: μ_{1 }− μ_{2} > 0, where groups 1 and 2 are heart attack and control groups, respectively.

*Minitab Output that can be used for Steps 2-5 *

**Step 2: **test statistic is given in last line of output as *t* = 6.15, degrees of freedom given as 37. Unpooled methods are applied since the comparison of the largest to smallest sample standard deviation is > 2 ------ 47.7 / 22.3 = 2.14

**Step 3:** *p*-value is give as 0.000. Since we are interested in a one-sided test (>), the *p*-value can be found by the area to the right of 6.15 in a *t*-distribution with *df* = 37. We could use T-Table to find this p-value range.

**Steps 4 and 5: **The p-value is less than .05 so we decide in favor of the alternative hypothesis. Thus we decide that the mean cholesterol is higher for those who have recently had a heart attack.

#### Details for the "two-sample t-test" for comparing two means (UNPOOLED)

The test statistic is

\(t=\frac{\bar{x}_1-\bar{x}_2}{s_p \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\)

For Example 1,

\(t=\frac{\bar{x}_1-\bar{x}_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}

=\frac{253.9-193.1}{\sqrt{\frac{47.7^2}{28}+\frac{22.3^2}{30}}}=6.15\)

The degrees of freedom are found using a complicated approximation formula. You won’t have to do that calculation "by hand", but is done by:

\(DF=\frac{(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2})^2}{\frac{1}{n_1-1} (\frac{s_1^2}{n_1})^2 + \frac{1}{n_2-1} (\frac{s_2^2}{n_2})^2}\)

COMPLICATED!!! But Minitab will do this for us.

**Conservative approach to calculating degrees of freedom for an unpooled two sample test of means is to use the smaller of ***n*_{1} – 1 or *n*_{2} – 1**. **

#### Example 2 (Pooled):

Hours spent studying per week are reported by students in a class survey. Students who say they usually sit in the front are compared to students who say they usually sit in the back.

**Step 1:** null is *H*_{0 }: μ_{1 }− μ_{2} = 0 and alternative is *H*_{a }: μ_{1 }− μ_{2} ≠ 0, where groups 1 and 2 are front sitters and back sitters, respectively.

*Minitab Output that can be used for Steps 2-5 *

**Step 2:** test statistic is given in last line of output as *t* = 3.75, degrees of freedom given as 191. The DF are found by* n*_{1} + *n*_{2 }– 2. Pooled methods are applied since the comparison of the largest to smallest sample standard deviation is ≤ 2 ------ 10.85 / 8.41 = 1.29 Again, we would have to first calculate these sample standard deviations so we would know whether to select in Minitab the "Assume Equal Variances".

**Step 3: ***p*-value is give as 0.000. Since we were interested in the two-sided test (not =) the *p*-value is the area to the right of 3.75 + area to left of -3.75 in a *t*-distribution with *df* = 191. Again we could use T-Table and *double* the *p*-value range for *t* = 3.75 with DF = 100 (since 100 is closest to 191 without going over.)

**Steps 4 and 5: **The *p*-value is less than .05 so we decide in favor of the alternative hypothesis. Thus we decide that the mean time spent studying is different for the two populations. From the sample means we see that the sample mean was clearly higher for those who sit in the front. (16.4 hours per week versus 10.9 hours per week).

#### Details for the "two-sample t-test" for comparing two means (POOLED)

The test statistic is

\(t=\frac{\bar{x}_1-\bar{x}_2}{s_p \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\)

where

\(s_p=\sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}\)

For Example 2:

\(s_p=\sqrt{\frac{(99-1)10.85^2 + (94-1)8.41^2}{99+94-2}}= 10.17\)

therefore

\(t=\frac{(16.4-10.9)-0}{10.17 \sqrt{\frac{1}{99}+ \frac{1}{94}}}=3.75\)

where degrees of freedom are

DF = *n*_{1} + n_{2} – 2

**SPECIAL NOTE: We will be calculating these values in Minitab, but I wanted you to be familiar with how Minitab calculates such statistics. **

Comparing two proportions – For proportions there consideration to using "pooled" or "unpooled" is based on the hypothesis: if testing "no difference" between the two proportions then we will pool the variance, however, if testing for a specific difference (e.g. the difference between two proportions is 0.1, 0.02, etc --- i.e. the value in Ho is a number other than 0) then unpooled will be used.