Space Efficient Streaming Algorithms for the Maximum Error Histogram
01 January 2007
We propose new algorithms for constructing maximum error (L−) histograms in the data stream model. Our first algorithm (Min-Merge) achieves the following performance guarantee: using O(B) memory, it constructs a 2B-bucket histogram whose approximation error is at most the error of the optimal B-bucket histogram. Our second algorithm (Min-Increment) achieves a (1 + ε)-approximation of a B-bucket histogram using O(ε-1 B log U) space, where U is the size of the domain for data values. The memory requirements of these algorithms are a significant improvement over the previous best schemes for constructing near-optimal histograms in the data stream model, making them ideal for data summary applications where memory is at a premium, such as wireless sensor networks. Our Min-Increment algorithm also extends to the sliding window model without any asymptotic increase in space. Finally, using synthetic and real-world data, we show that our algorithms are indeed as space-efficient in practice as their theoretical analysis predicts - compared to previous best algorithms, they require two or more orders of magnitude less memory for the same approximation error.