5.4.4  Conditional Independence
The concept of conditional independence is very important and it is the basis for many statistical models (e.g., latent class models, factor analysis, item response models, graphical models, etc.).
There are three possible conditional independence models with three random variables: (AB, AC), (AB, BC), and (AC, BC). Consider the model (AB, AC),
which means that B and C are conditionally independent given A. In mathematical terms, the model (AB, AC) means that the conditional probability of B and C given A equals the product of conditional probabilities of B given A and C given A:
\(P(B=j,C=kA=i)=P(B=jA=i) \times P(C=kA=i)\)
In terms of oddsratios, this model implies that if we look at the partial tables, that is B × C tables at each level of A = 1, . . . , I , that the oddsratios in these tables should not significantly different from 1. Tying this back to 2way tables, we can test in each of the partial B × C tables at each level of A to see if independence holds.
H_{0}: θ_{BC(A=i)} = 1 for all i
vs.
H_{0}: at least one θ_{BC(A=i}_{)} ≠ 1
Intuitively, (AB, AC) means that any relationship that may exist between B and C can be explained by A . In other words, B and C may appear to be related if A is not considered (e.g. only look at the marginal table B×C), but if one could control for A by holding it constant (i.e. by looking at subsets of the data having identical values of A, that is looking at partial tables B×C for each level of A), then any apparent relationship between B and C would disappear. Remember the Simpson's paradox?! Marginal and conditional associations can be different!
Under the conditional independence model, the cell probabilities can be written as
\begin{align}
\pi_{ijk} &= P(A=i) P(B=j,C=kA=i)\\
&= P(A=i)P(B=jA=i)P(C=kA=i)\\
&= \pi_{i++}\pi_{ji}\pi_{ki}\\
\end{align}
where Σ_{i} π_{i++} = 1, Σ_{j} π_{j  i} = 1 for each i, and Σ_{k }π_{k  i} = 1 for each i. The number of free parameters is (I − 1) + I (J − 1) + I (K − 1).
The ML estimates of these parameters are
\(\hat{\pi}_{i++}=n_{i++}/n\)
\(\hat{\pi}_{ji}=n_{ij+}/n_{i++}\)
\(\hat{\pi}_{ki}=n_{i+k}/n_{i++}\)
and the estimated expected frequencies are
\(\hat{E}_{ijk}=\dfrac{n_{ij+}n_{i+k}}{n_{i++}}.\)
Notice again the similarity to the formula for independence in a twoway table.
The test for conditional independence of B and C given A is equivalent to separating the table by levels of A = 1, . . . , I , and testing for independence within each level.
There are two ways we can test for conditional independence:
 The overall X^{2}or G^{2}statistics can be found by summing the individual test statistics for BC independence across the levels of A. The total degrees of freedom for this test must be I (J − 1)(K − 1). See example below, and we’ll see more on this again when we do loglinear models. Note, if we can reject independence in one of the partial tables, then we can reject the conditional independence and don't need to run the full analysis.
 CochranMantelHaenszel Test (using option CMH in PROC FREQ/ TABLES/ in SAS and mantelhaen.test in R). This test produces MantelHaenszel statistic also known as "average partial association" statistic.
Example  Boy Scouts and Juvenile Delinquency
Let us return to the table that classifies n = 800 boys by boy scout status B, juvenile delinquency D, and socioeconomic status S. We already found that the models of mutual independence (D, B, S) and joint independence (D, BS) did not fit. Thus we know that either B or S (or both) are related to D. Let us temporarily ignore S and see whether B and D are related (marginal independence). Ignoring S means that we classify individuals only by the variables B and D; in other words, we form a two way table for B × D, the same table that we would get by collapsing (i.e. adding) over the levels of S.
Boy scout

Delinquent


Yes

No


Yes

33

343

No

64

360

The X^{2 }test for this marginal independence demonstrates that a relationship between B and D does exist. Expected counts are printed below the observed counts:
Delinquent=Yes

Delinquent=No

Total


Boy Scout=Yes

33
45.59 
343
330.41 
376

Boy Scout=No

64
51.41 
360
372.59 
424

Total 
97

703

800

X^{2} = 3.477 + 0.480 + 3.083 + 0.425 = 7.465, where each value in the sum is a contribution (squared Pearson residual) of each cell to the overall Pearson X^{2} statistic. With df = 1, the pvalue=1 PROBCHI(7.465,1)=0.006 in SAS or in R pvalue=1pchisq(7.465,1)=0.006, rejecting the marginal independence of B and D. Or, simply do the Chisquared test of independence in this 2 × 2 table!
The odds ratio of (33 · 360)/(64 · 343) = 0.54 indicates a strong negative relationship between boy scout status and delinquency; it appears that boy scouts are 46% less likely (on the odds scale) to be delinquent than nonboy scouts.
To a proponent of scouting, this result might suggest that being a boy scout has substantial benefits in reducing the rates of juvenile delinquency. But boy scouts tend to differ from nonscouts on a wide variety of characteristics. Could one of these characteristics—say, socioeconomic status—explain the apparent relationship between B and D?
Let’s now test the hypothesis that B and D are conditionally independent given S. To do this, we enter the data for each 2 × 2 table of B × D corresponding to different levels of, S = 1, S = 2, and S = 3, respectively, then perform independence tests on these tables, and add up the X^{2}statistics (or run the CMH test  as in the next section).
To do this in SAS you can run the following command in boys.sas:
tables SES*scouts*delinquent / chisq;
Notice that the order is important; SAS will create partial tables for each level of the first variable; see boys.lst
The individual chisquare statistics from the output after each partial table are given below. To test the conditional independence of (BS, DS) we can add these up to get the overall chisquared statistic:
0.053+0.006 + 0.101 = 0.160.
Each of the individual tests has 1 degree of freedom, so the total number of degrees of freedom is 3. The pvalue is \(P(\chi^2_3 \geq 0.1600)=0.984\), indicating that the conditional independence model fits extremely well. As a result, we will not reject this model here. However, the pvalue is so high  doesn't it make you wonder what is going on here?
The apparent relationship between B and D can be explained by S; after the systematic differences in social class among scouts and nonscouts are accounted for, there is no additional evidence that scout membership has any effect on delinquency. The fact that the pvalue is so close to 1 suggests that the model fit is too good to be true; it suggests that the data may have been fabricated. (It’s true; Dr. Schafer made some of the data in order to illustrate this point!)
[Note: In the next section we will see how to use the CMH option in SAS  see boys.sas]
In R, in boys.R for example
temp[,,1]
will give us the B×D partial table for the first level of S, and similarly for the levels 2 and 3, where temp was the name of our 3way table this code; see boys.out.
The individual chisquare statistics from the output after each partial table are given below.
> chisq.test(temp[,,1], correct=FALSE) Pearson's Chisquared test data: temp[, , 1] Xsquared = 0.0058, df = 1, pvalue = 0.9392 > temp[,,2] scout deliquent no yes no 132 104 yes 20 14 > chisq.test(temp[,,2], correct=FALSE) Pearson's Chisquared test data: temp[, , 2] Xsquared = 0.101, df = 1, pvalue = 0.7507 > temp[,,3] scout deliquent no yes no 59 196 yes 2 8 > chisq.test(temp[,,3], correct=FALSE) Pearson's Chisquared test data: temp[, , 3] Xsquared = 0.0534, df = 1, pvalue = 0.8172
To test the conditional independence of (BS, DS) we can add these up to get the overall chisquared statistic:
0.006 + 0.101 + 0.053 = 0.160.
Each of the individual tests has 1 degree of freedom, so the total number of degrees of freedom is 3. The pvalue is \(P(\chi^2_3 \geq 0.1600)=0.984\), indicating that the conditional independence model fits extremely well. As a result, we will not reject this model here. However, the pvalue is so high  doesn't it make you wonder what is going on here?
The apparent relationship between B and D can be explained by S; after the systematic differences in social class among scouts and nonscouts are accounted for, there is no additional evidence that scout membership has any effect on delinquency. The fact that the pvalue is so close to 1 suggests that the model fit is too good to be true; it suggests that the data may have been fabricated. (It’s true; Dr. Schafer made some of the data in order to illustrate this point!)
[Note: In the next section we will see how to use the mantelhean.test in R, boys.R]
Spurious Relationship
To see how the spurious relationship between B and D could have been induced, it is worthwhile to examine the B × S and D × S marginal tables.
The B × S marginal table is shown below.
Socioeconomic status

Boy scout


Yes

No


Low

54

211

Medium

118

152

High

204

61

The test of independence for this table yields X^{2} = 172.2 with 2 degrees of freedom, which gives a pvalue of essentially zero. There is a highly significant relationship between B and S. To see what the relationship is, we can estimate the conditional probabilities of B = 1 for S = 1, S = 2, and S = 3:
P(B=1S=1)=54/(54 + 211) = .204
P(B=1S=2)=118/(118 + 152) = .437
P(B=1S=3)=204/(204 + 61) = .769
The probability of being a boy scout rises dramatically as socioeconomic status goes up.
Now let’s examine the D × S marginal table.
Socioeconomic status

Delinquent


Yes

No


Low

53

212

Medium

34

236

High

10

255

The test for independence here yields X^{2 }= 32.8 with 2 degrees of freedom, pvalue ≈ 0. The estimated conditional probabilities of D = 1 for S = 1, S = 2, and S = 3 are shown below.
P(D=1S=1)=53/(53 + 212) = .200
P(D=1S=2)=34/(34 + 236) = .126
P(D=1S=3=10/(10 + 255) = .038
The rate of delinquency drops as socioeconomic status goes up. Now we see how S induces a spurious relationship between B and D. Boy scouts tend to be of higher social class than nonscouts, and boys in higher social class have a smaller chance of being delinquent. The apparent effect of scouting is really an effect of social class.
In the next section, we study how to test for conditional independence via the CMH statistic.
EXERCISE Recall the results from death.sas (output: death.lst), and death.R (output: death.out) earlier in the lesson, and testing for independence via oddsratios for example within partial tables of Defendant's race vs. Death penalty, A × C, for each level of Victim's race, B; see the Notation section of this lesson if you don't recall marginal and partial tables. The question was, given the Victim's status, are the Defendant's race and Death penalty independent? In this case, the null hypothesis is that the conditional independence models fits, i.e., (AB, BC). What is the graphical representation here? This can be stated in terms of the partial oddsratios: H_{0}: θ_{AC(B=white)} = θ_{AC(B=black)}= 1 Based on the partial oddsratios estimates, their confidence intervals, the chisquared and deviance statistics for the test of independence in each of these partial tables, at the alpha level of 0.05, we do not have sufficient evidence to reject the null hypothesis, and thus the model of conditional independence describes the data well. 