Shrinking Trees.
30 July 1990
Tree-based models provide an alternative to linear models for classification and regression data. They are used primarily for exploratory analysis of complex data or as a diagnostic tool following a linear model analysis. They are also used as the end product in certain applications, such as speech recognition, medical diagnoses, and other instances where repeated fast classifications are required or where decision rules along coordinate axes facilitate understanding and communication of the model by practitioners in the field. Historically the key problem in tree-based modeling is deciding on the right size tree. This has been addressed by applying various stopping rules in the tree growing process, and more recently, by applying a pruning procedure to an overly large tree. Both approaches are intended to eliminate `over-fitting' the data, especially as regards using the tree for prediction. The approach taken in this paper provides yet another way to protect against over-fitting. As in the pruning case, we start with an overly large tree, but rather than cut off branches which seem to contribute little to the overall fit, we simply smooth the fitted values using a process called recursive shrinking. The shrinking process is parameterized by a scalar theta which ranges from zero to one. A value of zero implies shrinking all fitted values to that of the root of the tree, whereas a value of one implies no shrinking whatsoever. The shrinking parameter must be specified or otherwise selected on the basis of the data. We have used cross-validation to guide the choice in certain of the applications we have examined. Shrinking and pruning are qualitatively different although they tend to have similar predictive ability. We draw on analogies with the usual linear model to emphasize the differences as well as the similarities between the two methods.