Measures of Geometrical Complexity in Classification Problems

01 January 2005

New Image

Decades of classification research yielded many useful classifiers. Yet their accuracies on many practical applications remain far from perfect. Possible causes are deficiencies in the algorithms, intrinsic difficulties in the data, or a mismatch between methods and problems. We address this mystery by pursuing a better understanding of geometrical and topological characteristics of point sets in high-dimensional spaces, to provide a basis for analyzing classifier behavior. We discuss several measures useful for this characterization, and their utility in analyzing data sets with known or controlled complexity.