Information Retrieval from Documents: A Survey
Information Retrieval
Comparison and Classification of Documents Based on Layout Similarity
Information Retrieval
Substitution Deciphering Based on HMMs with Applications to Compressed Document Processing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Duplicate detection in consumer photography and news video
Proceedings of the tenth ACM international conference on Multimedia
Document Image Recognition Based on Template Matching of Component Block Projections
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Segmentation-free Approach for Keyword Search in Historical Typewritten Documents
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Accessing the content of Greek historical documents
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Robust image based document comparison using attributed relational graphs
SPPRA '08 Proceedings of the Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
In this paper we propose and implement a method for detecting duplicate documents in very large image databases. The method is based on a robust "signature" extracted from each document image which is used to index into a table of previously processed documents. The approach has a number of advantages over OCR or other recognition based methods including speed and robustness to imaging distortions. To justify the approach and test the scalability, we have developed a simulator which allows us to change parameters of the system and examine performance for millions of document signatures. A complete system is implemented and tested on a test collection of technical articles and memos.