Measuring the Complexity of Classification Problems

01 January 2000

New Image

We studied a number of measures that characterize the difficulty of a classification problem. We compared a set of real world problems to random combinations of points in this measurement space and found that real problems contain structures that are significantly different from the random sets. Distributions of problems in this space show that there exist at least two independent factors that affect a problem's difficulty, and that their effects may converge at t he extremes. We suggest using this space to describe a classifier's domain of competence. This can guide static and dynamic selection of classifiers for specific problems as well as subproblems formed by confinement, projections, and transformations of the feature vectors.