English or languish - Probing the ramifications
of Hong Kong's language policy
Cluster Analysis
project index | statistical modelling (diagnostics)
mds (number and interpretation of dimensions | research design issues)

Key Features

  • Variables and Data

    • modified non-metric data - similarity measures
    • metric data (interval and ratio scales) - distance measures
    • variable interdependence (no distinction between dependent and independent variables)
    • sample size - small to large
    • variable selection

  • Objective - data summary and classification as opposed to data reduction

  • General Uses - An important first goal of science is to find consistent patterns that reflect the way in which the world is ordered. Cluster analysis is a statistically based analytical approach that helps researchers identify such patterns. Cluster-analytical approaches can be divided into two major categories:

    • Classification (well-defined clusters)

      The assignment of observations to groups whose members common attributes. Unlike multiple discriminant analysis no assumptions about the nature of the groups are made before the analysis begins. more...

    • Structural Modelling (fuzzy clusters) | cluster analysis (identification)

      The representation of hidden, but suspected structural relationships. An alternative approach to multidimensional scaling and factor analysis including the

      • construction of dendrograms - the hierarchical ranking of groups of observations based on the degree of presence or absence of a selected set of attributes shared among the observtions of each group;

      • construction of overlapping clusters
        cluster analysis (identification | proximity measures)

        a spatial rendering that clusters observations with similar attributes into overlapping sets

  • Statistical Procedures - Cluster analysis does not consist of a single statistical procedure. In effect there are many clustering techniques each with its own requirements, options, and analytical strengths and weaknesses. Choosing the appropriate technique depends upon many factors including the nature of the data, one's research objective, and the availability of software.

  • Research design issues
  • Problems associated with all multivariate techniques that are common to cluster analysis
    • Selecting the appropriate data measurement tool (non-metric, ordinal metric, interval metric, ratio metirc)
    • Selecting the appropriate variables
    • Validation and reliability measures


Key Terms

  • Analytical routine - A statistical procedure that employs a particular proximity measure in order to group the observations of a population into identifiable clusters.

  • Attribute set - same as charactertistic or variable set. The words attributes, characteristics and variables are often used interchangeably.

  • Characteristic variable - a variable of the characteristic variable set.

  • Characteristic variable set - a set of variables whose values describe the observations of a sample population. These variables are used in the formation and identification of clusters.

  • Defining variables - variables that demonstrate high correlation with some variables, but little or no correlation with others that in turn correlate highly with still others. more...

  • Identification - the assignment of meaningful names to clusters. more...

  • Non-defining variables - variables that have been added to a cluster after a rotation.

  • Pivot variables - variables that correlate highly with some variables but demonstrate little correlation with others. more...

  • Proximity measures - Statistical constructs employed by clustering routines to assign observations and order clusters. The most broadly used measures of proximity are classified as follows:

    • Distance measures more...
      • Mahalanobis D2 proximity measure
    • Correlation measures more...
    • Similarity measures more...

  • Span diagram - a graphical representation of intervariable and intervariable-cluster relationships. Useful in determining structural relationships among clusters.

Cluster Types
cluster analysis (classification possibilities)

  • Well-defined Clusters - Clusters which exhibit both strong external isolation and good internal cohesion. See classification above.
  • Fuzzy Clusters - Fuzzy clusters are inherently vague and thus useful in helping to understand complex, overlapping phenomena for which well-defined structural differences are difficult to determine. See structural modelling above.

Reference List

Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham, and Bernie J. Grablowsky. 1984. Multivariate Data Analysis with Readings. New York: Macmillan Publishing Co.

Hughes, Adele. 1984. Class notes to graduate coursework in Applied Stastistical Methods.

Punj, Girish and David W. Stewart. 1983. Cluster analysis in marketing research: review and suggestions for application. Journal of Marketing Research 20 (May) 134-48. cluster analysis (research design issues)

Sethi, S. P. 1971. Comparative cluster analysis for world markets. Journal of Marketing Research, 8 (August) 384-54.