Fast Incremental Maintenance of Approximate Histograms
01 September 2002
Many commercial database systems maintain histograms to summarize the contents of large relations and permit efficient estimation of query result sizes for use in query optimizers. Delaying the propagation of database updates to the database, our techniques are the first to maintain histograms effectively up-to-date at all times and avoid computing overheads when unnecessary. Our techniques provide highly-accurate approximate histograms belonging to the equi-depth and Compressed classes. Experimental results show that our new approaches provide orders of magnitude more accurate estimation than previous approaches. An important aspect employed by these new approaches is a backing sample, and up-to-date random sample of the tuples currently in a relation. We provide efficient solutions for maintaining a uniformly random sample of a relation in the presence of updates to the relation. The backing sample techniques can be used for any other application that relies on random samples of data.