Lesson 6: Multivariate Conditional Distribution and Partial Correlation

Lesson 6: Multivariate Conditional Distribution and Partial Correlation

Overview

In a multivariable setting partial correlations are used to explore the relationships between pairs of variables after we take into account the values of other variables..

For example, in a study of the relationship between blood pressure and blood cholesterol, it might be thought that both of these variables are related to the age of the subject. That is, we might be interested in looking at the correlation between these two variables for subjects of the same age.

Objectives

Upon completion of this lesson, you should be able to:

  • Construct a conditional distribution;
  • Understand the definition of a partial correlation;
  • Compute partial correlations using SAS and Minitab
  • Test the hypothesis that the partial correlation is equal to zero, and draw appropriate conclusions from that test;
  • Compute and interpret confidence intervals for partial correlations.

6.1 - Conditional Distributions

6.1 - Conditional Distributions

Partial correlations may only be defined after introducing the concept of conditional distributions. We will restrict ourselves to conditional distributions from multivariate normal distributions only.

If we have a p × 1 random vector \(\mathbf{Z}\), we can partition it into two random vectors \(\mathbf{X}_1\) and \(\mathbf{X}_2\) where \(\mathbf{X}_1\) is a p1 × 1 vector and \(\mathbf{X}_2\) is a p2 × 1 vector as shown in the expression below:

\(\textbf{Z} = \left(\begin{array}{c} \textbf{X}_1 \\ \textbf{X}_2\end{array}\right)\)

Conditional Distribution Properties

Further, suppose that we partition the mean vector and covariance matrix in a corresponding manner. That is,

\(\boldsymbol{\mu} = \left(\begin{array}{c}\boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2\end{array}\right)\)  and  \(\mathbf{\Sigma} = \left(\begin{array}{cc}\mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12}\\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{array}\right)\)

For instance, \(\boldsymbol{\mu}_{1}\) gives the means for the variables in the vector \(\mathbf{X}_{1}\), and \(\Sigma _ { 11 }\) gives variances and covariances for vector \(\mathbf{X}_{1}\). The matrix  \(\Sigma _ { 12 }\) gives covariances between variables in vector \(\mathbf{X_{1}}\) and vector \(\mathbf{X_{2}}\) (as does matrix \(\Sigma _ { 21 }\)).

Any distribution for a subset of variables from a multivariate normal, conditional on known values for another subset of variables, is a multivariate normal distribution.

Conditional Distribution
The conditional distribution of \(\mathbf{X}_{1}\) given known values for \(\mathbf{X}_2=\mathbf{x}_{2}\) is a multivariate normal with:
\begin{align} \text{mean vector} & =  \mathbf{\mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2-\mu_2)}\\ \text{covariance matrix} & = \mathbf{\Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}} \end{align}

Bivariate Case

Suppose that we have p = 2 variables with a multivariate normal distribution. The conditional distribution of \(X_{1}\) given knowledge of \(x_{2}\) is a normal distribution with

\begin{align} \text{Mean} & =  \mu_1 + \frac{\sigma_{12}}{\sigma_{22}}(x_2-\mu_2) \\ \text{Variance} & = \sigma_{11}- \frac{\sigma^2_{12}}{\sigma_{22}}\end{align}

Example 6-1: Conditional Distribution of Weight Given Height for College Men

Suppose that the weights (lbs) and heights (inches) of undergraduate college men have a multivariate normal distribution with mean vector \(\mathbf{\mu} =
\left(\begin{array}{c}  175\\ 71 \end{array}\right)\) and covariance matrix \(\mathbf{\Sigma} = \left(\begin{array}{cc} 550 & 40\\ 40 & 8 \end{array}\right)\).

The conditional distribution of \(X_{1}\) weight given \(x_{2}\) = height is a normal distribution with

\begin{align} \text{Mean} &= \mu_1 + \frac{\sigma_{12}}{\sigma_{22}}(x_2-\mu_2)\\[5pt] &= 175 + \frac{40}{8}(x_2-71) \\[5pt] &= -180+5x_2 \end{align}

\begin{align} \text{Variance} &= \sigma_{11}- \frac{\sigma^2_{12}}{\sigma_{22}}\\ &= 550-\frac{40^2}{8} \\[5pt] &= 350 \end{align}

For instance, for men with height = 70, weights are normally distributed with mean = -180 + 5(70) = 170 pounds and variance = 350. (So standard deviation \(\sqrt{350} = 18.71\) = pounds)

Notice that we have generated a simple linear regression model that relates weight to height.

Conditional Means, Variances and Covariances

So far, we have only considered unconditional population means, variances, covariances, and correlations. These quantities are defined under the setting in which the subjects are sampled from the entire population. For example, blood pressure and cholesterol may be measured from a sample selected from the population of all adult citizens of the United States.

To understand partial correlations, we must first consider conditional means, variances, and covariances. These quantities are defined for some subset of the population. For example, blood pressure and cholesterol may be measured from a sample of all 51 year old citizens of the United States. Thus, we may consider the population mean blood pressure of 51 year old citizens. This quantity is called the conditional mean blood pressure given that the subject is a 51 year old citizen.

More than one condition may be applied. For example, we may consider the population mean blood pressure of 51 year old citizens who weigh 190 pounds. This quantity is the conditional mean blood pressure given that the subject is 51 years old and weighs 190 pounds.

Conditional Mean

Let Y denote a vector of variables (e.g., blood pressure, cholesterol, etc.) of interest, and let X denote a vector of variables on which we wish to condition (e.g., age, weight, etc.). Then the conditional mean of Y given that X equals a particular value x (i.e., X = x) is denoted by

\(\mu_{\textbf{Y.x}} = E(\textbf{Y}|\textbf{X=x})\)

This is interpreted as the population mean of the vector Y given a sample from the subpopulation where X = x.

 

Conditional Variance

Let Y denote a variable of interest, and let X denote a vector of variables on which we wish to condition. Then the conditional variance of Y given that X = x is

\(\sigma^2_{\textbf{Y.x}} = \text{var}(\mathbf{Y}|\textbf{X=x}) = E\{(\mathbf{Y}-\boldsymbol{\mu}_{\textbf{Y.x}})^2|\textbf{X=x}\}\)

Because Y is random, so is \(\left( \mathbf{Y} - \boldsymbol{\mu}_{\textbf{Y.x}} \right) ^ { 2 }\) and hence\(\left( \mathbf{Y} - \boldsymbol{\mu}_{\textbf{Y.x}} \right) ^ { 2 }\) has a conditional mean. This can be interpreted as the variance of Y given a sample from the subpopulation where X = x.

 

Conditional Covariance

Let \(Y_{i}\) and \(Y_{j}\) denote two variables of interest, and let X denote a vector of variables on which we wish to condition. Then the conditional covariance between \(Y_{i}\) and \(Y_{j}\) given that X = x is

\(\sigma_{i,j.\textbf{x}} = \text{cov}(Y_i, Y_j| \textbf{X=x}) = E\{(Y_i-\mu_{Y_i.x})(Y_j-\mu_{Y_j.x})|\textbf{X=x}\}\)

Because \(Y_{i}\) and \(Y_{j}\) are random, so is \(\left( Y_{ i } - \mu_{ Y_i.x } \right) \left( Y_{ j } - \mu_{ Y_j.x } \right)\) and hence \(\left( Y_{ i } - \mu_{ Y_i.x } \right) \left( Y_{ j } - \mu_{ Y_j.x } \right)\) has a conditional mean. This can be interpreted as the covariance between \(Y_{i}\) and \(Y_{j}\) given a sample from the subpopulation where X = x.

Just as the unconditional variances and covariances can be collected into a variance-covariance matrix \(Σ\), the conditional variances and covariances can be collected into a conditional variance-covariance matrix:

\(\mathbf{\Sigma_{Y.x}}= \text{var}\mathbf{(Y|X=x)} = \left(\begin{array}{cccc}\sigma^2_{Y_1\textbf{.X}} & \sigma_{12\textbf{.X}} & \dots & \sigma_{1p\textbf{.X}}\\ \sigma_{21\textbf{.X}} & \sigma^2_{Y_2 \textbf{.X}} & \dots & \sigma_{2p \textbf{.X}} \\ \vdots & \vdots & \ddots & \vdots\\ \sigma_{p1 \textbf{.X}} & \sigma_{p2 \textbf{.X}} & \dots & \sigma^2_{Y_p\textbf{.X}} \end{array}\right)\)

Partial Correlation

The partial correlation between \(Y_{j}\) and \(Y_{k}\) given X = x is:

\[\rho_{jk\textbf{.X}} = \dfrac{\sigma_{jk\text{.X}}}{\sigma_{Y_j\textbf{.X}}\sigma_{Y_k \textbf{.X}}}\]

Note! This is computed in the same way as unconditional correlations, replacing unconditional variances and covariances with conditional variances and covariances. This can be interpreted as the correlation between \(Y_{j}\) and \(Y_{k}\) given a sample from the subpopulation where X = x.

The Multivariate Normal Distribution

Next, let us return to the multivariate normal distribution. Suppose that we have a random vector Z that is partitioned into components X and Y that is realized from a multivariate normal distribution with mean vector with corresponding components \(\boldsymbol{\mu}_{X}\) and \(\boldsymbol{\mu}_{Y}\), and variance-covariance matrix which has been partitioned into four parts as shown below:

 

\(\textbf{Z} = \left(\begin{array}{c}\textbf{X}\\ \textbf{Y} \end{array}\right) \sim N \left(\left(\begin{array}{c}\boldsymbol{\mu}_X\\\boldsymbol{\mu}_Y \end{array}\right), \left(\begin{array}{cc} \mathbf{\Sigma_{X}} & \mathbf{\Sigma_{XY}}\\ \mathbf{\Sigma_{YX}} & \mathbf{\Sigma_Y} \end{array}\right)\right)\)

 

Here, \(\mathbf{\Sigma_{X}}\) is the variance-covariance matrix for the random vector X. \( \mathbf{\Sigma_Y}\)is the variance-covariance matrix for the random vector Y. And, \(\mathbf{\Sigma_{YX}}\) contains the covariances between the elements of X and the corresponding elements of Y.

Then the conditional distribution of Y given that X takes a particular value x is also going to be a multivariate normal with conditional expectation as shown below:

\(E(\textbf{Y}|\textbf{X=x}) = \mathbf{\mu_Y} + \mathbf{\Sigma_{YX}\Sigma^{-1}_X}(\mathbf{x}-\boldsymbol{\mu}_X)\)

Note that this is equal to the mean of Y plus an adjustment. This adjustment involves the covariances between X and Y, the inverse of the variance-covariance matrix of X, and the difference between the value x and the mean for the random variable X. If little x is equal to \(\boldsymbol{\mu}_{X}\), then the conditional expectation of Y given that X is simply equal to the ordinary mean for Y.

In general, if there are positive covariances between the X's and Y's, then a value of X, greater than \(\boldsymbol{\mu}_{X}\) will result in a positive adjustment in the calculation of this conditional expectation. Conversely, if X is less than \(\boldsymbol{\mu}_{X}\), then we will end up with a negative adjustment.

The conditional variance-covariance matrix of Y given that X = x is equal to the variance-covariance matrix for Y minus the term that involves the covariances between X and Y and the variance-covariance matrix for X. For now we will call this conditional variance-covariance matrix A as shown below:

\(\text{var}(\textbf{Y|X=x}) = \mathbf{\Sigma_Y - \Sigma_{YX}\Sigma^{-1}_X\Sigma_{XY}} = \textbf{A}\)

We are finally now ready to define the partial correlation between two variables \(Y_{j}\) and \(Y_{k}\) given that the random vector X = x. This is shown in the expression below:

\(\rho_{jk\textbf{.x}} = \dfrac{a_{jk}}{\sqrt{a_{jj}a_{kk}}}\)

This is basically the same formula that we would have for the ordinary correlation, in this case calculated using the conditional variance-covariance matrix in place of the ordinary variance-covariance matrix.

Partial correlations can be estimated by substituting in the sample variance-covariance matrixes for the population variance-covariance matrixes as shown in the expression below:

\(\widehat{\text{var}}(\textbf{Y|X=x}) = \mathbf{S_Y - S_{YX}S^{-1}_XS_{XY}}= \hat{\textbf{A}}\)

where

\(\mathbf{S} = \left(\begin{array}{cc} \mathbf{S_X} & \mathbf{S_{XY}}\\ \mathbf{S_{YX}} & \mathbf{S_Y}\end{array}\right)\)

is the sample variance-covariance matrix of the data.

Then the elements of the estimated conditional variance-covariance matrix can be used to obtain the partial correlation as shown below:

\(r_{jk\textbf{.x}} = \dfrac{\hat{a}_{jk}}{\sqrt{\hat{a}_{jj}\hat{a}_{kk}}}\)

If we are just conditioning on a single variable, then we have a simpler expression available to us. If we are looking at the partial correlation between variables j and k, given that the \(i^{th}\) variable takes the value of little \(y_{i}\), this calculation can be obtained by using the expression below. The partial correlation between \(Y_{j}\) and \(y_{k}\) given \(Y_{i}\) = \(y_{i}\) is estimated by:

\(r_{jk.i} = \dfrac{r_{jk}-r_{ij}r_{ik}}{\sqrt{(1-r^2_{ij})(1-r^2_{ik})}}\)


6.2 - Example: Wechsler Adult Intelligence Scale

6.2 - Example: Wechsler Adult Intelligence Scale

Example 6-2: Wechsler Adult Intelligence Scale

To illustrate these calculations we will return to the Wechsler Adult Intelligence Scale data.

This dataset includes data on n = 37 subjects taking the Wechsler Adult Intelligence Test. This test is broken up into four components:

  • Information
  • Similarities
  • Arithmetic
  • Picture Completion

Recall from the last lesson that the correlation between Information and Similarities was \(r = 0.77153\).

Using SAS

The partial correlation between Information and Similarities given Arithmetic and Picture Completion may be computed using the SAS program shown below.

Download the SAS program: wechsler2.sas

Download the SAS Output: wechsler2.lst

View the video below to find the partial correlation of Information and Similarities given Arithmetic and Picture Completion using the Wechsler Adult Intelligence Test data in SAS.

Using Minitab

View the video below to find the partial correlation of Information and Similarities given Arithmetic and Picture Completion using the Wechsler Adult Intelligence Test data in Minitab.

 


Analysis

The output is in two tables. The first table gives the conditional variance-covariance matrix for Information and Similarities given Arithmetic and Picture Completion. The second table gives the partial correlation. Here we can see that the partial correlation is:

\(r = 0.71188\)

Conclusion: Comparing this to the previous value for the ordinary correlation, we can see that the partial correlation is not much smaller than the ordinary correlation. This suggests that little of the relationship between Information and Similarities can be explained by performance on the Arithmetic and Picture Completion portions of the test.

 

Interpretation

Partial correlations should be compared to the corresponding ordinary correlations. When interpreting partial correlations, three results can potentially occur. Each of these results yields a different interpretation.

  1. Partial and ordinary correlations are approximately equal. This occurred in our present setting. This suggests that the relationship between the variables of interest cannot be explained by the remaining explanatory variables upon which we are conditioning.
  2. Partial correlations are closer to zero than ordinary correlations. This is a common result and often what we anticipate. This suggests that the relationship between the variables of interest might be explained by their common relationships to the explanatory variables upon which we are conditioning. For example, we might find the ordinary correlation between blood pressure and blood cholesterol might be a high, strong positive correlation. We could potentially find a very small partial correlation between these two variables, after we have taken into account the age of the subject. If this were the case, this might suggest that both variables are related to age, and the observed correlation is only due to their common relationship to age.
  3. Partial correlations are farther from zero than ordinary correlations. This rarely happens. This situation would suggest that unless we take into account the explanatory variables upon which we are conditioning, the relationship between the variables of interest is hidden or masked.

6.3 - Testing for Partial Correlation

6.3 - Testing for Partial Correlation

When discussing ordinary correlations we looked at tests for the null hypothesis that the ordinary correlation is equal to zero, against the alternative that it is not equal to zero. If that null hypothesis is rejected, then we look at confidence intervals for the ordinary correlation. Similar objectives can be considered for the partial correlation.

First, consider testing the null hypothesis that a partial correlation is equal to zero against the alternative that it is not equal to zero. This is expressed below:

\(H_0\colon \rho_{jk\textbf{.x}}=0\) against \(H_a\colon \rho_{jk\textbf{.x}}\ne 0\)

Here we will use a test statistic that is similar to the one we used for an ordinary correlation. This test statistic is shown below:

\(t = r_{jk\textbf{.x}}\sqrt{\frac{n-2-k}{1-r^2_{jk\textbf{.x}}}}\)      \(\dot{\sim}\)  \(t_{n-2-k}\)

The only difference between this and the previous one is what appears in the numerator of the radical. Before we just took n - 2. Here we take n - 2 - k, where k is the number of variables upon which we are conditioning. In our Adult Intelligence data, we conditioned on two variables so k would be equal to 2 in this case.

Under the null hypothesis, this test statistic will be approximately t-distributed, also with n - 2 - k degrees of freedom.

We would reject \(H_{o}\colon\) if the absolute value of the test statistic exceeded the critical value from the t-table evaluated at \(α\) over 2:

\(|t| > t_{n-2-k, \alpha/2}\)

Example 6-3: Wechsler Adult Intelligence Data

For the Wechsler Adult Intelligence Data we found a partial correlation of 0.711879, which we enter into the expression for the test statistic as shown below:

\(t = 0.711879 \sqrt{\dfrac{37-2-2}{1-0.711879^2}}=5.82\)

The sample size is 37, along with the 2 variables upon which we are conditioning is also substituted in. Carry out the math and we get a test statistic of 5.82 as shown above.

Here we want to compare this value to a t-distribution with 33 degrees of freedom for an α = 0.01 level test. Therefore, we are going to look at the critical value for 0.005 in the table (because 33 does not appear to use the closest df that does not exceed 33 which is 30).  In this case it is 2.75, meaning that \(t _ { ( d f , 1 - \alpha / 2 ) } = t _ { ( 33,0.995 ) } \) is 2.75.

Note! Some text tables provide the right tail probability (the graph at the top will have the area in the right tail shaded in) while other texts will provide a table with the cumulative probability - the graph will be shaded into the left. The concept is the same. For example, if alpha was 0.01 then using the first text you would look under 0.005 and in the second text look under 0.995.

Because \(5.82 > 2.75 = t _ { ( 33,0.995 ) }\), we can reject the null hypothesis, \(H_{o}\) at the \(\alpha = 0.01\) level and conclude that there is a significant partial correlation between these two variables. In particular, we would include that this partial correlation is positive indicating that even after taking into account Arithmetic and Picture Completion, there is a positive association between Information and Similarities.

Confidence Interval for the partial correlation, \(\rho_{jk\textbf{.x}}\)

The procedure here is very similar to the procedure we used for ordinary correlation.

Steps

  1. Compute the Fisher's transformation of the partial correlation using the same formula as before.

    \(z_{jk} = \dfrac{1}{2}\log \left( \dfrac{1+r_{jk\textbf{.X}}}{1-r_{jk\textbf{.X}}}\right) \)

    In this case, for a large n, this Fisher transform variable will be possibly normally distributed. The mean is equal to the Fisher transform for the population value for this partial correlation, and variance equal to 1 over n-3-k.

    \(z_{jk}\)  \(\dot{\sim}\)  \(N \left( \dfrac{1}{2}\log \dfrac{1+\rho_{jk\textbf{.X}}}{1-\rho_{jk\textbf{.X}}}, \dfrac{1}{n-3-k}\right)\)

  2. Compute a \((1 - α) × 100\%\) confidence interval for the Fisher transform correlation. This expression is shown below:

    \( \dfrac{1}{2}\log \dfrac{1+\rho_{jk\textbf{.X}}}{1-\rho_{jk\textbf{.X}}}\)

    This yields the bounds \(Z_{l}\) and \(Z_{u}\) as before.

    \(\left(\underset{Z_l}{\underbrace{Z_{jk}-\dfrac{Z_{\alpha/2}}{\sqrt{n-3-k}}}}, \underset{Z_U}{\underbrace{Z_{jk}+\dfrac{Z_{\alpha/2}}{\sqrt{n-3-k}}}}\right)\)

  3. Back transform to obtain the desired confidence interval for the partial correlation - \(\rho_{jk\textbf{.X}}\)

    \(\left(\dfrac{e^{2Z_l}-1}{e^{2Z_l}+1}, \dfrac{e^{2Z_U}-1}{e^{2Z_U}+1}\right)\)

Example 6-3: Wechsler Adult Intelligence Data (Steps Shown)

The confidence interval is calculated substituting in the results from the Wechsler Adult Intelligence Data into the appropriate steps below:

Step 1: Compute the Fisher transform:

\begin{align} Z_{12} &= \dfrac{1}{2}\log \frac{1+r_{12.34}}{1-r_{12.34}}\\[5pt] &= \dfrac{1}{2} \log \frac{1+0.711879}{1-0.711879}\\[5pt] &= 0.89098 \end{align}

Step 2: Compute the 95% confidence interval for \( \frac{1}{2}\log \frac{1+\rho_{12.34}}{1-\rho_{12.34}}\) :

\begin{align} Z_l &= Z_{12}-Z_{0.025}/\sqrt{n-3-k}\\[5pt] & = 0.89098 - \dfrac{1.96}{\sqrt{37-3-2}}\\[5pt] &= 0.5445 \end{align}

\begin{align} Z_U &= Z_{12}+Z_{0.025}/\sqrt{n-3-k}\\[5pt] &= 0.89098 + \dfrac{1.96}{\sqrt{37-3-2}} \\[5pt] &= 1.2375 \end{align}

Step 3: Back-transform to obtain the 95% confidence interval for \(\rho_{12.34}\) :

\(\left(\dfrac{\exp\{2Z_l\}-1}{\exp\{2Z_l\}+1}, \dfrac{\exp\{2Z_U\}-1}{\exp\{2Z_U\}+1}\right)\)

\(\left(\dfrac{\exp\{2\times 0.5445\}-1}{\exp\{2\times 0.5445\}+1}, \dfrac{\exp\{2\times 1.2375\}-1}{\exp\{2\times 1.2375\}+1}\right)\)

\((0.4964, 0.8447)\)

Based on this result, we can conclude that we are 95% confident that the interval (0.4964, 0.8447) contains the partial correlation between Information and Similarities scores given scores on Arithmetic and Picture Completion.


6.4 - Summary

6.4 - Summary
In this lesson we learned about:
  • Conditional means, variances, and covariances
  • The definition of the partial correlation and how it may be estimated for data sampled from a multivariate normal distribution
  • Interpretation of the partial correlation
  • Methods for testing the null hypothesis that there is zero partial correlation
  • How to compute confidence intervals for the partial correlation

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility