project index  
 English or languish - Probing the ramifications
of Hong Kong's language policy
Multiple Discriminant Analysis (MDA)    
statistical modelling | multiple discriminant analysis | 3-step analysis (step 1 | step 3)


Step 2 - Validation

  • General points of interest

    • Classification matrices
    • Cutting score determination
    • Chance models

  • Classification matrices
    • Measures of the discriminant function's significance are generally of little value, as the functions themselves might fail to discriminate well between the groups' centroids even when the distance between them is statistically significant. How well the discriminant function is able to classify individual group members is a better test of the function's utility.
    • Hit - ratios
      A hit - ratio is the percentage of correctly classified observations. In so far as they tell how much of the total variation in the dependent variable is accounted for by the discriminant function, hit - ratios are somewhat analogous to R-square values in regression analysis. Accordingly the F-statistic for regression analysis and the Chi-square statistic for discriminant analysis are analogous.
    • Cutting scores (Critical Z-values) - A cutting score is the decision rule for determining an individual observation's group membership. If the groups are of equal size, then the cutting scores are equal to the midpoints between the cetroids of the different groups. When the groups are of different size the value of the centroids must be weighted.

Zc = (Na·Za + Nb·Zb) / (Na + Nb)


Zc = the critical Z
Na = number of observations in group A
Nb = number of observations in groups B Za = centroid of group A
Zb = centroid of group B

Normal distribution is assumed for the Z - scores around their centroids.

    • Optimal cutting scores - Weighted cutting scores that do not take into account the cost of misclassification are not optimal, unless the cost of misclassification is the same for both groups.
  • Procedure
    cluster analysis (research design issues)

    After the sample has been split and the discriminant function determined, the observations of the hold-out sample are classified by the function and placed into a table that compares their estimated membership with the known membership.

    • A t-test is employed to determine the level of significance of the discriminant function's ability to classify the observations correctly. For a two-group analysis with equal sample size the following formula is employed:

t = (p - 0.5) / [0.5(1 - 0.5) / n]1/2

where p = the proportion of correctly classified observations
and n = the size of the entire sample

This formula can be modifed to include more than two groups of different size.

    • Chance models - a few rules of thumb

      In general the discriminant function should be able to predict the proper group for each observation better than chance itself. How much better will depend on the cost of generating the discriminant function -- namely, the cost of analysis; and the actual value derived from accurately producing group membership.

      Two common rules of thumb in this regard are the
      • Maximum chance criterion, and
      • Proportional chance criterion

Chance models only produce accurate measures when split samples are possible.

    • Only if both statistical accuracy of the discriminant function and a satisfactory level of accuracy in classification are achieved, does interpretation of the discriminant function become useful.

Go to step 3