Bandwidth selection: Classical or plug-in?

01 April 1999

New Image

Bandwidth selection for procedures such as kernel density estimation and local regression have been widely studied over the past decade. Substantial ``evidence{''} has been collected to establish superior performance of modern plug-in methods in comparison to methods such as cross validation: this has ranged from detailed analysis of rates of convergence, to simulations, to superior performance on real datasets. In this work we take a detailed look at some of this evidence, looking into the sources of differences. Our findings challenge the claimed superiority of plug-in methods on several fronts. First, plug-in methods are heavily dependent on arbitrary specification of pilot bandwidths and fail when this specification is wrong. Second, the often-quoted variability and undersmoothing of cross validation simply reflects the uncertainty of bandwidth selection; plug-in methods reflect this uncertainty by oversmoothing and missing important features when given difficult problems. Third, we look at asymptotic theory. Plug-in methods use available curvature information in an inefficient manner, resulting in inefficient estimates. Previous comparisons with classical approaches penalized the classical approaches for this inefficiency Asymptotically, the plug-in based estimates are beaten by their own pilot estimates.