7.1.4 - Example: Women’s Survey Data and Associated Confidence Intervals

Example 7-1: Woman's Survey Data (Hotelling's \(T^{2}\) Test) Section

The data are stored in the file that can be downloaded: nutrient.txt

Using SAS

The Hotelling's \(T^{2}\) test is calculated using the SAS program as shown below.

It turns out that SAS does not have any procedures for calculating Hotelling's \(T^{2}\). So, in this case, we are going to have to rely on the simple procedure, which can carry out matrix manipulation. And in this case, it can be used for carrying out Hotelling's \(T^{2}\) test.

Download the SAS Program: nutrient4.sas

To use this program, I recommend copying the program here and then making the necessary changes to use it with your dataset on your homework assignment. Again, only three entries are all we really need to change to use this with any dataset. First the specified value of μo. Second, which dataset you want to use, and third, the specification of what variable you want to analyze.

Download the results printed in the output: nutrient4.lst.

View the video below to see how to find the Hotelling's \(T^2\) value using the SAS statistical software application.

Using Minitab

View the video below to see how to find the Hotelling's \(T^2\) value using the Minitab statistical software application.



The recommended intake and sample mean are given below

Variable Recommended Intake (\(μ_{o}\)) Mean
Calcium 1000 mg 624.0 mg
Iron 15mg 11.1 mg
Protein 60g 65.8 g
Vitamin A 800 μg 839.6 μg
Vitamin C 75 mg 78.9 mg

as well as the sample variance-covariance matrix:

\(S = \left(\begin{array}{rrrrr}157829.4 & 940.1 & 6075.8 & 102411.1 & 6701.6 \\ 940.1 & 35.8 & 114.1 & 2383.2 & 137.7 \\ 6075.8 & 114.1 & 934.9 & 7330.1 & 477.2 \\ 102411.1 & 2383.2 & 7330.1 & 2668452.4 & 22063.3 \\ 6701.6 & 137.7 & 477.2 & 22063.3 & 5416.3 \end{array}\right)\)

Hotelling’s T-square comes out to be:

\(T^2 = 1758.54\)

The F-statistic is:

\(F = 349.80 > 3.042 = F_{5,732,0.01}\)

For an 0.01 level test, the critical value is approximately 3.02. Because 349.80 is greater than this value, we can reject the null hypothesis that the average dietary intake meets recommendations.

\((T^2 = 1758.54; F = 349.80; d.f. = 5,732; p < 0.0001)\)

The SAS program reports the p-value as 0.00. In this case, the p-values can never equal zero. It is preferable to state that the p-value is less than 0.0001.



For all women between 25 and 50 years old, the average dietary intake does not meet recommended standards. Returning to the table of sample means and recommended dietary intake, it appears that women fail to meet nutritional standards for calcium and iron, and perhaps exceed intakes for protein, vitamin A and vitamin C.

Such a statement, however, is not entirely backed up by the evidence.


 A Question Emerges...

For which nutrients do women fall significantly below recommended nutritional intake levels? Or, conversely, for what nutrients do women fall significantly above recommended nutritional intake levels?

A naive approach to addressing the above is to calculate Confidence Intervals for each of the nutritional intake levels, one-at-a-time, using the univariate method as shown below:

\(\bar{x}_j \pm t_{n-1, \alpha/2} \sqrt{s^2_j/n}\)

If we consider only a single variable, we can say with \((1 - α) × 100\%\) confidence that the interval includes the corresponding population mean.

Example 7-2: Women’s Health Survey: Section

A one-at-a-time 95% confidence interval for calcium is given by the following where values are substituted into the formula and the calculations are worked out as shown below:

\(624.04925 \pm t_{736, 0.025}\sqrt{157829.4/737}\)

\(624.04925 \pm 1.96 \times 14.63390424\)

\(624.04925 \pm 28.68245\)

\((595.3668, 652.7317)\)

The one-at-a-time confidence intervals are summarized in the table below:

Variable \(μ_{0}\) 95% Confidence Interval
Calcium 1000 mg 595.3, 652.8
Iron 15mg 10.7, 11.6
Protein 60g 63.6, 68.0
Vitamin A 800 μg 721.5, 957.8
Vitamin C 75 mg 73.6, 84.2

Looking at this table, it appears that the average daily intakes of calcium and iron are too low (because the intervals fall below the recommended intakes of these variables), and the average daily intake of protein is too high (because the interval falls above the recommended intake of protein).

Problem: The problem with these one-at-a-time intervals is that they do not control for family-wide error rate.

Consequence: We are less than 95% confident that all of the intervals simultaneously cover their respective means.

To fix this problem we can calculate a \((1 - \alpha) × 100\%\) Confidence Ellipse for the population mean vector \(\boldsymbol{\mu}\). To calculate this confidence ellipse we must recall that for independent random observations from a multivariate normal distribution with mean vector \(\mu\) and variance-covariance matrix \(Σ\), the F-statistic, (shown below), is F-distributed with p and n-p degrees of freedom:

\(F = n\mathbf{(\bar{x} - \boldsymbol{\mu})'S^{-1}(\bar{x} - \boldsymbol{\mu})}\dfrac{(n-p)}{p(n-1)} \sim F_{p,n-p}\)

This next expression says that the probability that n times the squared Mahalanobis distance between the sample mean vector, \(\boldsymbol{\bar{x}}\), and the population mean vector \(\boldsymbol{\mu}\) is less than or equal to p times n-1 times the critical value from the F-table divided by n-p is equal to \(1 - α\).

\(\text{Pr}\{n\mathbf{(\bar{x} - \boldsymbol{\mu})'S^{-1}(\bar{x} - \boldsymbol{\mu})} \le \dfrac{p(n-1)}{(n-p)}F_{p,n-p,\alpha}\} = 1-\alpha\)

Here the squared Mahalanobis distance between \(\bar{\mathbf{x}}\) and \(\boldsymbol{\mu}\) is being used.

Note! A closely-related equation for a hyper-ellipse is:

\(n\mathbf{(\bar{x} -\boldsymbol{\mu})'S^{-1}(\bar{x} - \boldsymbol{\mu})} = \dfrac{p(n-1)}{(n-p)}F_{p,n-p, \alpha}\)

In particular, this is the \((1 - \alpha) × 100%\) confidence ellipse for the population mean, \(\boldsymbol{\mu}\).


The \((1 - \alpha) × 100\%\) confidence ellipse is very similar to the prediction ellipse that we discussed earlier in our discussion of the multivariate normal distribution. A \((1 - \alpha) × 100\%\) confidence ellipse yields simultaneous \((1 - \alpha) × 100\%\) confidence intervals for all linear combinations of the variable means. Consider linear combinations of population means as below:

\(c_1\mu_1 + c_2\mu_2 + \dots c_p \mu_p = \sum_{j=1}^{p}c_j\mu_j = \mathbf{c'\boldsymbol{\mu}}\)

The simultaneous \((1 - \alpha) × 100\%\) confidence intervals for the above are given by the expression below

\(\sum_{j=1}^{p}c_j\bar{x}_j \pm \sqrt{\frac{p(n-1)}{(n-p)}F_{p, n-p, \alpha}}\sqrt{\frac{1}{n}\sum_{j=1}^{p}\sum_{k=1}^{p}c_jc_ks_{jk}}\)

In terms of interpreting the \((1 - \alpha) × 100\%\) confidence ellipse, we can say that we are \((1 - \alpha) × 100%\) confident that all such confidence intervals cover their respective linear combinations of the treatment means, regardless of what linear combinations we may wish to consider. In particular, we can consider the trivial linear combinations which correspond to the individual variables. So this says that we going to be also \((1 - \alpha) × 100\%\) confident that all of the intervals given in the expression below:

\(\bar{x}_j \pm \sqrt{\frac{p(n-1)}{(n-p)}F_{p, n-p, \alpha}}\sqrt{\frac{s^2_j}{n}}\)

cover their respective treatment population means. These intervals are called simultaneous confidence intervals.

Example 7-3: Women’s Health Survey Section

Simultaneous confidence intervals are computed using hand calculations. For calcium, we substituted the following values: The sample mean was 624.04925. We have p = 5 variables, a total sample size of n = 737, and if we look up in the F-table for 5 and 732 degrees of freedom for alpha = 0.05, the critical value is 2.21. The standard error of the sample mean for calcium is equal to the square root of the sample variance for calcium, 157,829.44, divided by the sample size, 737. The math is carried out to obtain an interval running from 575.27 to approximately 672.83 as shown below:

\(624.04925 \pm \sqrt{\frac{5(737-1)}{737-5}\underset{2.21}{\underbrace{F_{5,737-5,0.05}}}}\sqrt{157829.4/737}\)

\(624.04925 \pm 3.333224 \times 14.63390424\)

\(624.04925 \pm 48.77808\)

\((575.27117, 672.82733)\)

Using SAS

These calculations may also be carried out using the SAS program.

SAS Program can be downloaded: nutrient5.sas

In terms of using this program, there is only a couple of things you need to change: the value of p in line five and what appears in the data step at the top of the page.

View the video below to see how to find the Hotelling's \(T^2\) value using the SAS statistical software application.

In terms of using the program with other datasets, basically what you need to do is create a separate line starting with the variable for each variable that is included in the analysis. In this case, we have five nutritional variables so we have five of these lines in the first data step. Then you set this equal to, in quotes, the name you wish to give that variable. After the semicolon, we set x = to the name that we specified for that variable in the input statement and finally after another semicolon, we type in "output;".

The output is available to download here: nutrient5.lst.

The output contains the sample means for each of the variables, gives the results of the calculations under data step "b".

Confidence intervals are given by the columns for "losim" and "upsim"

Using Minitab

Minitab does not support this function

The following table gives the confidence intervals:

Variable \(μ_{0}\) 95% Confidence Interval
Calcium 1000 mg 575.1, 673.0
Iron 15 mg 10.4, 11.9
Protein 60 g 62.0, 69.6
Vitamin A 800 μg 638.3, 1040.9
Vitamin C 75 mg 69.9, 88.0

Looking at these simultaneous confidence intervals we can see that the upper bound of the interval for calcium falls below the recommended daily intake of 1000 mg. Similarly, the upper bound for iron also falls below the recommended daily intake of 15 mg. Conversely, the lower bound for protein falls above the recommended daily intake of 60 g for protein. The intervals for both vitamin A and C both contain the recommended daily intake for these two vitamins.

Therefore, we can conclude that the average daily intakes of calcium and iron fall below the recommended levels, and the average daily intake of protein exceeds the recommended level.

Problem with Simultaneous Confidence Intervals

The problem with the simultaneous confidence intervals is that if we are not interested in all possible linear combinations of variables or anything other than the means by themselves, then the simultaneous confidence interval may be too wide, and hence, too conservative. As an alternative to the simultaneous confidence intervals, we can use the Bonferroni intervals.

Bonferri Intervals

The Bonferroni intervals are given in the expression below:

\(\bar{x}_j \pm t_{n-1, \frac{\alpha}{2p}}\sqrt{s^2_j/n}\)

An example from the USDA Women’s Health Survey data will illustrate this calculation.

Example 7-4: Women’s Health Survey Section

Here, the 95% confidence interval for calcium under the Bonferroni correction is calculated:

\(624.04925 \pm t_{737-1, \frac{0.05}{2 \times 5}}\sqrt{157829.4/737}\)

\(624.04925 \pm 2.576 \times 14.63390424\)

\(624.04925 \pm 37.69694\)

\((586.35231, 661.74619)\)

This calculation uses the values for the sample mean for calcium, 624.04925, the critical value from the t-table, with n-1 degrees of freedom, evaluated at alpha over 2 times p, (0.05 divided 2 times 5, or .005). This critical value turns out to be 2.576 from the t-table. The standard error is calculated by taking the square root of the sample variance, (157,829.44), divided by the sample size, 737.

Carrying out the math, the interval goes from 586.35 to 661.75.

Using SAS

These calculations can also be obtained in the SAS program with the downloadable code below. The calculations of the upper and lower Bonferroni intervals are given by "lobon" and "upbon" at the end of data step "b". They involve the calculations:


SAS Program can be downloaded here: nutrient5.sas

Using Minitab

At this time Minitab does not support this procedure.


The table below shows the results of the computation

Variable \(\mu_{0}\) 95% Confidence Interval
Calcium 1000 mg 586.3, 661.8
Iron 15mg 10.6, 11.7
Protein 60g 62.9, 68.7
Vitamin A 800 μg 684.2, 995.0
Vitamin C 75 mg 71.9, 85.9

When compared to the simultaneous intervals, we see that the Bonferroni intervals are narrower. However, in this case, the conclusions will be the same. The confidence intervals for both vitamin A and C both cover their respective recommended daily intakes. Intervals for calcium and iron fall below the recommended intake, while the interval for protein falls above it.