Models and algorithms for duplicate document detection

This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution derived from the realm of approximate string matching. The robustness of these techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects

View Publication

Select your country

Models and algorithms for duplicate document detection

Looking for Nokia licensed products support?

Looking for Nokia licensed products support?