Complexity Measures of Supervised Classification Problems
01 March 2002
We studied a number of measures that characterize the difficulty of a classification problem, with focus on the geometrical complexity of the class boundary. We compared a set of real world problems to random labelings of points in this measurement space and found that real problems contain structures that are significantly different from the random sets. Distributions of problems in this space show that there exist at least two independent factors affecting a problem's difficulty. We suggest using this space to describe a classifier's domain of competence. This can guide static and dynamic selection of classifiers for specific problems as well as subproblems formed by confinement, projection, and transformations of the feature vectors.