A text image enhancement system based on segmentation and classification methods
Proceedings of the 1st ACM workshop on Hardcopy document processing
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
Beyond topical similarity: a structural similarity measure for retrieving highly similar documents
Knowledge and Information Systems
Improving web information indexing and retrieval based on center block duplication detection
International Journal of Innovative Computing and Applications
Query by document via a decomposition-based two-level retrieval approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Hi-index | 0.00 |
This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution derived from the realm of approximate string matching. The robustness of these techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects.