Lesson 10: Log-Linear Models

Printer-friendly versionPrinter-friendly version

Introduction to Loglinear Models

Thus far in the course we have alluded to log-linear models several times, but have never got down to the basics of it. When we dealt with inter-relationships among several categorical variables, our focus had been on describing independence, interactions or associations between two, three or more categorical variables mostly via

  • single summary statistics, and
  • with significance testing.

Log-linear models go beyond a single summary statistics and specify how the cell counts depend on the levels of categorical variables. They model the association and interaction patterns among categorical variables. The log-linear modeling is natural for Poisson, Multinomial and Product-Mutlinomial sampling. They are appropriate when there is no clear distinction between response and explanatory variables, or there are more than two responses. This is a major difference between logistic models and log-linear models. In the former a response is identified, but no such special status is assigned to any variable in log-linear modelling. By default log-linear models assume discrete variables to be nominal, but these models can be adjusted to deal with ordinal and matched data. Log-linear models are more general than logit models, but some log-linear models have direct correspondence to logit models.

Consider graduate admissions at Berkeley.  We may consider all possible relationships among A = Admission, D = Department and S = Gender. Alternatively, we may consider as response and and as covariates in which case the possible logit models are:

  • logit model for A with only an intercept;
  • logit model for A with a main effect for D;
  • logit model for A with a main effect for S;
  • logit model for A with a main effects for D and S; and
  • logit model for A with main effects for D and S and the D × S interaction.

Corresponding to each of the above a log-linear model may be defined. The notations below follow those of Lesson 5.

  • Model of joint independence (DS, A), which indicates neither D nor S has an effect on A is equivalent to a logit model for A with only an intercept;
  • Model of conditional independence (DS, DA), which indicates that sex has no effect on A after the effect of department is included, is equivalent to a logit model for A with a main effect for D;
  • Another conditional independence model (DS, SA) is equivalent to a logit model for A with a main effect for S only;
  • Model of no three-factor interaction (DS, DA, SA) indicates that the effect of sex on A is the same at each level of department, is equivalent to a logit model for A with main effects for D and S; and
  • Model of three-factor interaction or the saturated model (DSA) indicates that the effect of sex on A varies across departments and is equivalent to a logit model for A with main effects for D and S and the D × S interaction.

“Equivalent," means that two models give equivalent goodness-of-fit statistics relative to a saturated model, and equivalent expected counts for each cell. Log-linear models are not exactly the same as logit models, because the log-linear models describe the joint distribution of all three variables, whereas the logit models describe only the conditional distribution of A given D and S. Log-linear models have more parameters than the logit models, but the parameters corresponding to the joint distribution of D and S are not of interest.

In general, to construct a log-linear model that is equivalent to a logit model, we need to include all possible associations among the predictors. In the Berkeley example, we need to include DS in every model.  This lesson will walk-through examples how this is done in both SAS and R.

In subsequent sections we look at the log-linear models in more detail. The two great advantages of log-linear models are that they are flexible and they are interpretable. Log-linear models have all the flexibility associated with ANOVA and regression. We have mentioned before that log-linear models are also another form of GLM. They also have natural interpretations in terms of odds and frequently have interpretations in terms of independence, as we have shown above.