English or languish - Probing the ramifications
of Hong Kong's language policy
Factor Analysis
Principal Component and Common Factor Analysis

project index  | statistical modelling (diagnostics)
cluster analysis (research design issues) | mds (number and interpretation of dimensions)

Key Features

  • Variables

    • Metric (interval or ratio) - Many
    • Dummy variables

  • Sample Size - As the number of respondents to the HKLNA study is expected to be large, sample size with regard to this procedure is not likely to be a problem. As a general rule the number of obversations should be four to five times greater than the number of variables.

  • Objective - Data reduction and summarization

  • General Uses
    cluster analysis (key features)

    Factor analysis is an interdependence technique that examines the interrelationships among a large number of variables in an effort to determine underlying dimensions (factors). Factor analysis can be used for the following purposes:

    1. R - Analysis identifies a set of underlying dimensions among a large number of variables.

    2. Q - Analysis
      cluster analysis (proximity measures)

      The procedure condenses a large number of observations into distinctly different groups whose shared characteristics describe the population from which the observations are drawn.

    3. Identify key variables among a large number of variables for the purpose of further analysis using other, often predictive, statistical analyses - surrogate variable selection.

    4. Create an entirely new, but less numerous, set of variables to replace the original set for the purpose of further analysis using other statistical techniques. (return to factor model)

  • Statistical Procedure - Although a very useful statistical procedure factor analysis requires a large number of decisions in order to obtain meaningful results. For this reason a decision tree and corresponding explanation have been created. There are two major approaches two factor analysis

    • Prinicipal component analysis
    • Common factor analysis

Key Terms

  • Communality - The communality of a variable is the proportion of a variable's total variance explained by all of factors on which it loads. It is the sum of the squared loadings for each variable on all factors. (return to factor model)

Hi2 = ai12 + ai22 + ... + aip2

The square root of the communality for each variable SQRT(Hi2 ) is the length of the variable's factor loading in factor vector space.

Subtracting the communality of a variable from one yields that variable's uniqueness (unique variance).

  • Eigen Values - The sum of the column of squared loadings for each factor

    • EVj = a1j2 + a2j2 + ... + anj2
    • Eigen values are the roots of the characteristic equation for the correlation matrix. There is one eigen value for each factor.
    • Σλj = ΣHi2
      for j = 1, 2,..., P and i = 1, 2,..., N
    • Dividing eigenvalues by either the number of variables (component analysis) or the sum of the communalities (common factor analysis) and multiplying by 100 yields the percent of variation (see percent of variance) which a single factor takes into account.
    • Although the eigensums are the same for both the rotated and unrotated factor solutions, the eigenvalues themselves have different meanings. In the initial, unrotated solution the vertical sum of the squared loadings tells us something about the relative importance of each factor. This is not true for the rotated version!

  • Factor Loading

    • aij - A factor loading is the correlation between an original variable (or observation when performing Q-analysis) and a particular factor.
    • aij2 - The square of a factor loading is the percent fraction of variation that a variable shares in common with a particular factor.

  • Factor Matrix - The tabulated numerical output of a factor solution. A factor matrix generally includes a list of factors with their associated variable factor loadings. Communalities and eigen values are also provided.

    • Factor Pattern Matrix - This matrix contains the coefficients of the factors that describe the linear relationship among the factors which determine the standardized values of each variable for any given observation:

Zi = ai1F1 + ai2F2 + ... + aiPFP + eiYi

where Zi = the standardized value of the ith variable ( Xi),

where ai1 = a regression coefficient for the ith variable and the jth factor

where Fj = the factor score for the jth factor, and

where εiYi = the error term.

  • Factor Scores
    cluster analysis (proximity measures)

    Composite measures that reflect the importance of each factor relative to individual observations.

  • Factor solutions - A factor solution is simply the set of factors that result from a factor analysis experiment. Factor solutions fall into two major types:

    • Orthogonal Factor Solutions, and
    • Oblique Factor Solutions

Orthogonal factor solutions yield factors that are statistically independent and can be used with other statistical procedures that must satisfy assumptions of statistical independence.

Oblique factor solutions are solutions for which factors are correlated. They are usually a better reflection of the underlying reality which the researcher seeks to describe. (return to extraction method)

  • Percent of Trace - The part of total model variance taken into account by a single factor.

PVj = [(a1j2 + a2j2 + ... + a3j2) / T] · 100

where j = the jth factor

where T = the trace of the correlation matrix

Caution: The percents of trace for rotated and unrotated common factor solutions are derived differently from those of principal component analysis. For common factor analysis the trace for rotated solutions is the same as that for unrotated solutions. Whereas the summed communalities for unrotated solutions are equal to the trace of the correlation matrix, their sum tor rotated solutions is not.

    • Total Percent of Trace - Summing across the percents of trace for each factor yields the total percent of trace. The total percent of trace is an indicator of how well a particular factor solution accounts for the variance of all the variables. If the variables are all very different the total percent of trace will be low. If the variables are similar the total percent of trace will be high.
  • Percent of Variance - The part of total variance taken into account by a single factor. (return to eigenvalue)

PVj= λj/ N

where N = the total number of variables or total model variance.

For common factor aalysis the percent of trace and percent of variance are different insofar as the first measures the proportion of common variance and the second the proportion of total variance taken into account by the factor in question.

    • Percent of Total Variance (PTV) - The common variance explained by all factors as a percentage of total variance.

      PTV = PV1 + PV2 + ... + PVp
  • Principal Diagonal - The principal diagonal of the correlation matrix whose elements are defined differently according to the factoring procedure employed. (return to factor model)

    • Principal component analysis - the elements of the principal diagonal are equal to one. In other words the variable correlates exactly with itself and all variance associated with that variable, both systematic and unsystematic, are included in the analysis
    • Common factor analysis - the elements of the principal diagonal are not equal to one; rather, they are equal to the communalities associated with the variables of the original unrotated solution.

  • Sources of Variance (return to uniqueness)

    • Common variance - the variance that a single variable shares in common with one or more of the other variables of the analysis.
    • Unique variance - the sum of both the systematic and unsystematic variance that is common to no other variable. (return to factor model)

      • Specific variance - the systematic variance specific to a particular variable and not shared with all other variables .
      • Error variance - the unsystematic variance specific to a particular variable.

    • Total variance - For any given variable the total variance associated with that variable is equal to the sum of its common variance and unique variance, or alternatively

total variance = common varaince + unique variance

where unique variance = specific variance + error variance

  • Surrogate variable - That variable which loads heaviest on a factor and is used to represent that factor in subsequent analysis. Surrogate variables can be utilized in lieu of factor scores. (return to factor model)
  • Trace - The sum of the elements of the principal diagonal of the correlation matrix (return to percent of trace)

    • Principal component analysis - the trace equals the total number of variables included in the analysis
    • Common factor analysis - the trace equals the sum of the communalities of all variables of the initial unrotated factor solution.

  • Uniqueness (return to communality) - The unique variance of a variable with respect to other variables of a factor analysis is obatined as follows:

    Ui = 1 - Hi2

    See Sources of Variance for further discussion.

Orthogonal Rotation

There are several orthogonal rotation techniques including

  • Quartimax - The Quartimax approach seeks to simplify the rows of the factor matrix. The goal is to obtain factor loadings in which each variable loads high on only one factor and low on all others. As a result many variables tend to load heavily on single factors.
  • Varimax - The Varimax approach seeks to simplify the columns of the factor matrix. Thus, for anyone factor all variables tend to load either very high or very low.
  • Equimax - The Equimax approach seeks a balance between row and factor simplification.

A thorough analysis might employ them all.


Factor Extraction Criteria

There are several different criteria commonly employed to determine the number of factors to extract. These include the


Reference List

Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham, and Bernie J. Grablowsky. 1984. Multivariate Data Analysis with Readings. New York: Macmillan Publishing Co.

Hughes, Adele. 1984. Class notes to graduate coursework in Applied Stastistical Methods.