12.4 - Example: Places Rated Data - Principal Component Method

Example 12-1: Places Rated Section

Let's revisit the Places Rated Example from Lesson 11.  Recall that the Places Rated Almanac (Boyer and Savageau) rates 329 communities according to nine criteria:

  1. Climate and Terrain
  2. Housing
  3. Health Care & Environment
  4. Crime
  5. Transportation
  6. Education
  7. The Arts
  8. Recreation
  9. Economic

Except for housing and crime, the higher the score the better.For housing and crime, the lower the score the better.

Our objective here is to describe the relationships among the variables.

Before carrying out a factor analysis we need to determine m. How many common factors should be included in the model? This requires a determination of how may parameters will be involved.

For p = 9, the variance-covariance matrix \(\Sigma\) contains

\(\dfrac{p(p+1)}{2} = \dfrac{9 \times 10}{2} = 45\)

unique elements or entries. For a factor analysis with m factors, the number of parameters in the factor model is equal to

\(p(m+1) = 9(m+1)\)

Taking m = 4, we have 45 parameters in the factor model, this is equal to the number of original parameters, This would result in no dimension reduction. So in this case, we will select m = 3, yielding 36 parameters in the factor model and thus a dimension reduction in our analysis.

It is also common to look at the results of the principal components analysis. The output from Lesson 11.6 is below. The first three components explain 62% of the variation. We consider this to be sufficient for the current example and will base future analyses on three components.

Component Eigenvalue Proportion Cumulative
1 3.2978 0.3664 0.3664
2 1.2136 0.1348 0.5013
3 1.1055 0.1228 0.6241
4 0.9073 0.1008 0.7249
5 0.8606 0.0956 0.8205
6 0.5622 0.0625 0.8830
7 0.4838 0.0538 0.9368
8 0.3181 0.0353 0.9721
9 0.2511 0.0279 1.0000

We need to select m so that a sufficient amount of variation in the data is explained. What is sufficient is, of course, subjective and depends on the example at hand.

Alternatively, often in social sciences, the underlying theory within the field of study indicates how many factors to expect. In psychology, for example, a circumplex model suggests that mood has two factors: positive affect and arousal. So a two-factor model may be considered for questionnaire data regarding the subjects' moods. In many respects, this is a better approach because then you are letting the science drive the statistics rather than the statistics drive the science! If you can, use your or a field expert's scientific understanding to determine how many factors should be included in your model.

Using SAS

The factor analysis is carried out using the program as shown below:

Download the SAS Program here: places2.sas

  View the video explanation of the SAS code.

Using Minitab

View the video below to see how to perform a factor analysis using the Minitab statistical software application.

Initially, we will look at the factor loadings. The factor loadings are obtained by using this expression

\(\hat{e}_{i}\sqrt{ \hat{\lambda}_{i}}\)

These are summarized in the table below. The factor loadings are only recorded for the first three factors because we set m=3. We should also note that the factor loadings are the correlations between the factors and the variables. For example, the correlation between the Arts and the first factor is about 0.86. Similarly, the correlation between climate and that factor is only about 0.28.

  Factor
Variable 1 2 3
Climate 0.286 0.076 0.841
Housing 0.698 0.153 0.084
Health 0.744 -0.410 -0.020
Crime 0.471 0.522 0.135
Transportation 0.681 -0.156 -0.148
Education 0.498 -0.498 -0.253
Arts 0.861 -0.115 0.011
Recreation 0.642 0.322 0.044
Economics 0.298 0.595 -0.533

Interpreting factor loadings is similar to interpreting the coefficients for principal component analysis. We want to determine some inclusion criterion, which in many instances, may be somewhat arbitrary. In the above table, the values that we consider large are in boldface, using about .5 as the cutoff. The following statements are based on this criterion:

  1. Factor 1 is correlated most strongly with Arts (0.861) and also correlated with Health, Housing, Recreation, and to a lesser extent Crime and Education. You can say that the first factor is primarily a measure of these variables.

  2. Factor 2 is primarily related to Crime, Education and Economics. Here we can see that Factor 2 is associated with high levels of Crime and Economics and low Education ratings. This distinguishes cities with high economic levels and high levels of crime from cities with poor educational systems.

  3. Factor 3 is primarily a measure of Climate and is also negatively related to Economics. Factor 3 distinguishes between cities with highly rated climates and cities with poor economies.

The interpretation above is very similar to that obtained in the standardized principal component analysis.