Principal Profiles

New Image

A variety of multivariate techniques have been modified to analyze categorical data represented as relative frequencies summing to one. Correspondence analysis is an appropriately weighted principal component analysis, and is useful for uncovering low dimensional structure in the compositional profiles. However, certain forms of 'linear' structure are forced to appear non-linear by the constraints of the simplex, the so-called 'arch effect'. This paper presents a new method for modelling compositional data. We finesse the arch effect by modelling the logit transform of the profiles, the natural parameter in the multinomial family. We discuss the formulation of the principal profiles model, its estimation by maximum likelihood and its performance on some real data.