Quantifying the utility of the past in mining large databases
01 July 2000
Incremental mining algorithms that can efficiently derive the current mining output by utilizing previous mining results are attractive to business organizations since data mining is typically a resource-intensive recurring activity. In this paper, we present the DELTA algorithm for the robust and efficient incremental mining of association rules on large market basket databases. DELTA guarantees efficiency by ensuring that, for any dataset, at most three passes over the increment and one pass over the previous database are required to generate the desired rules. Further, it handles ``multi-support{''} environments where the support requirements for the current mining differ from those used in the previous mining, a feature in tune with the exploratory nature of the mining process. We present a performance evaluation of DELTA on large databases over a range of increment sizes and data distributions, as well as change in support requirements. The experimental results show that DELTA can provide significant improvements in execution times over previously proposed incremental algorithms in all these environments. In fact, for many workloads, its performance is close to that achieved by an optimal, but practically infeasible, algorithm. (C) 2000 Elsevier Science Ltd. All rights reserved.