Data compression with long repeated strings
01 June 2001
Lempel-Ziv schemes compress data by encoding repeated strings that occur in a small sliding window. We propose a scheme that succinctly encodes long strings that appear far apart in the input text. Such long strings are rare in most documents, but occur frequently in data such as large software systems, subroutine libraries, news articles, and other corpora of real documents. Analysis shows that our scheme is computationally efficient, and experiments show that effectively compresses some classes of input. (C) 2001 Published by Elsevier Science Inc.