Fast approximate answers to aggregate queries on a data cube

01 August 1999

New Image

Modern decision support systems require very quick (interactive) responses from the DBMS, but pose complex queries on large volumes of data. In this paper, we present a novel solution to this problem: we precompute concise histogram statistics on the data to answer the queries quickly but approximately. Our hypothesis is that many decision support applications can tolerate small errors in query results in return for large reductions in response times. In particular, we propose the use of multiple histograms to approximate the data cube and answer aggregate queries approximately using this summarized data. We enhance histograms to estimate the quality of the approximate answers. We primarily explore the interaction among various histograms on the data cube in order to minimize the space needed when an upper bound on the errors is given. Our main contribution in this paper is an efficient technique for selecting a provably near-optimal set of histograms on the data cube. Extensive experiments show that our technique results in very accurate and concise statistics. Our technique is general in nature and can also be used for selecting a set of histograms (or other statistics) on a relation for the purpose of selectivity estimation