HKLNA Project - Multivariate Analysis of Variance and Covariance

English or languish - Probing the ramifications
of Hong Kong's language policy

Multivariate Analysis of Variance and Covariance
(MANOVA and MANOCOVA)

project index | statistical modelling (diagnostics)

Key Features
Variables Dependent variable - metric (interval or ratio) - 2 or more Independent variables - non-metric (categorical) - 1 or more Covariate independent variables - metric (0, 1, or more) Objective - Detemine the effectiveness of various treatments and/or factors on specific characteristics of a known population through either random sampling or experimentation. Statistical procedure - Compare within group and between group variances of multivariate groups in order to determine whether the vector spaces between the groups' centroids are statistically significant when measured as a whole. This is achieved through partitioning of the sums-of-squares and cross-products matrix (SSCP). MANOVA and MANOCOVA are multivariate extensions of ANOVA and ANOCOVA, respectively. Further testing is required in order to determine statistical significance between individual pairs of group centroids. Null hypothesis - The equality of the dependent mean vectors (centroids) for each of two or more independent groups or treatments. See understanding the null-hypothesis.

Key Terms
Analysis of variance (ANOVA) - A statistical technique to determine if samples come from populations with equal means. Simply conceived MANOVA is ANOVA with multiple dependent variables. Centroid - The mean vector of the dependent variables for each treatment group. Covariate analysis - ANOVA and MANOVA statistical experiments that include nuisance variables or covariates. Criterion variables - Another name for dependent variables. With MANOVA and MANOCOVA experiements there is always more than one. Effects Main (principal) effect - the effect of a single, independent, nonmetric variable (factor) on the dependent variable (ANOVA) or variables (MANOVA) Interaction (joint) effect - the joint effect of two or more independent variables on the dependent variable (ANOVA) or variables (MANOVA). Variation in the dependent variabe(s) caused by main and interaction effects are measured separately. It is possible, for example, to have statistically significant interaction effects with no significant main effects. Extraneous (nuisance) variables - Other names for metric independent variables that are included as part of the experimental design to remove noise when measuring the main and interaction effects of the independent, nonmetric, categorical variables on the metric dependent variables. Factor (a treatment or experimental variable) - A nonmetric categorical variable manipulated by the researcher to effect changes in the dependent variable. In the case of one-way experimental design there is only one factor. Two-way and three-way experimental designs have two and three factors, respectively. Multivariate normal distribution - A generalization of univariate normal distribution involving two or more dependent variables. SSCP Matrix - Sum of squares and cross products matrix. This matrix contains the various types of variation required to calculate the effects and their statistical significance. Treatment level (treatment group) - A single factor may consist of more than one treatment level or group. Each observation of an experiment is subject to only one treatment per factor. In effect, each observation of a given group is manipulated (treated) in a way that observations of other groups of the same or different factors are not. Wilk's Lambda Statistic - A multivariate extension of the F-test for ANOVA experiments. It is used to test for the significance of individual main effects and interaction joint effects of independent variable(s) on dependent variables.

Assumptions
Multivariate normality - the centroid for each treatment group is assumed to be multivariate normal. This makes normal distribution of the error terms possible. Homogeneity of variance across treatment groups - the variance-covariance matrice for each treatment group is identical to every other. In other words, the dispersion within each treatment group is assumed identical for all groups.

Experimental design
MANOVA and MANOCOVA experiments are conceptually speaking expanded versions of ANOVA and ANOCOVA that employ two or more dependent variables. In brief, MANOVA and MANOCOVA measure the principal and joint effects of one or more nonmetric dependent variables on a vector of dependent variables A few of the more common experimental designs using these statistical methods include One-way MANOVA Problem A single nonmetric independent variable Two or more metric dependent variables One-way MANOCOVA Problem A single nonmetric independent variable Two or more metric dependent variables One of more metric covariates (nuisance variables) Two-way MANOVA Problem Two nonmetric independent variables Two or more metric dependent variables Two-way MANOCOVA Problem Two nonmetric independent variable Two or more metric dependent variables One of more metric covariates (nuisance variables) Fixed, random, and mixed effects Fixed effect - The individual observations of a particular treatment group are randomly selected from a larger population before the experiment has taken place. Assignment to individual treatment groups is usually performed randomly, so as to more accurately measure the effect of the treatment on the dependent variable. Random effect - The individual observations of a particular treatment group are randomly selected from a larger population after the experiment has already taken place. Mixed effect - A combination of fixed and random effect experiments.

Testing the Null-Hypothesis
Commonly employed statistics for two treatment groups Wilk's Lambda Statistic (Special cases) Hotelling's T² Statistic Mahalanobis' D² Statistic Commonly employed statistics for three or more treatment groups Wilks' Lambda Statistic The Wilks' Λ statistic for MANOVA corresponds to the F-statistic in ANOVA. Smaller values of Λ indicate greater statistical significance as the variance within groups becomes increasingly smaller relative to the variance among groups. The distribution of the Wilks' Λ statistic corresponds well with that of the F - statistic distribution in the following two cases: Case 1: Any number of dependent (criterion) variables, and Up to three independent variables (factors and covariates) Case 2: Two dependent variables, and Any number of independent variables Among the available software statistics associated with MANOVA and MANOCOVA Wilks' Λ is the most common. Bartlett's V Statistic Rao's Ra function

Post Hoc Tests - Three or more treatment groups
In cases where statistical significance for rejecting the null-hypothesis has been found the researcher may wish to probe more deeply in an effort to determine which between group differences are significant. Although one may be tempted to perform univariate F-tests for each of the independent variables, this approach is discouraged, as it ignores possible or even likely correlations among the dependent variables. There are two standard procedures to distinguish between informative and noninformative variables -- i.e., variables that do and do not provide information about the population under study. These procedures are: Protected F - and t - tests Canonical representation Protected F - and t - tests The Protect F- test α₀ (expeiment-wise type 1 error) - If the overall multivariate test statistic (eg. Wilks' Λ statistic) is not statistically significant, then no further analysis is performed. α₁ (variable-wise type 1 error) - If overall statistical significance is found, then only those variables for which a univariate F - statistic demonstrates statistical significanct are tested. A standard rule of thumb is to set α₁ = α / p, where α is the desired error rate for the entire set of comparisons and p is the number of independent variables to be compared. The Protected Hotelling T² and Protected t - tests α₂ (comparison-wise type 1 error) - Pairwise tests are performed on those variables which were significant at the α₁ level. The standard rule of thumb for setting α₂ is found by setting α₂ = α₃ · q , where q is the number of variables subject to pair-wise comparison. α₃ (comparison-wise and variable-wise type 1 error) - Only those variables pairs for which statistical signifiance at the α₃ is found are interpreted. Canonical representation The standard statistical technique for performing canonical representation is discriminant analysis. Canonical representation has as its objective the reduction of dimensionality through the use of optimal artificial variates. All of the original variates are retained but only a limited number of linear combinations of them (the canonical variates) are interpreted. Briefly, canonical variates may be used to represent the centroids of n groups in an n - 1 dimensional subspace of the original p (number of variates used in the MANOVA or MANOCOVA experiment) such that p ≥ n - 1. The interpretation of differences between the populations is usually made by assigning some substantive meaning to each of the canonical variates through an interpretation of the coefficients of the linear combination.

Reference List
Hair, Joseph F., Jr., Rolph E. Anderson, Ronald L. Tatham, and Bernie J. Grablowsky. 1984. Multivariate Data Analysis with Readings. New York: Macmillan Publishing Co. Hughes, Adele. 1984. Class notes to graduate coursework in Applied Stastistical Methods.