Duplicate detection approaches for quality assurance of document image collections

Authors:
Roman Graf;Reinhold Huber-Mörk;Alexander Schindler;Sven Schlarb
Affiliations:
Networks and Services, AIT - Austrian Institute of Technology GmbH, Vienna, Austria;Research Area Intelligent Vision Systems, AIT - Austrian Institute of Technology GmbH, Vienna, Austria;Research Area Intelligent Vision Systems, AIT - Austrian Institute of Technology GmbH, Vienna, Austria;Austrian National Library, Vienna, Austria
Venue:
Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Year:
2013

Citing 13
Cited 0

Evaluation of Interest Point Detectors

International Journal of Computer Vision - Special issue on a special section on visual surveillance
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
Distance Measures for Layout-Based Document Image Retrieval

DIAL '06 Proceedings of the Second International Conference on Document Image Analysis for Libraries
How to choose a digital preservation strategy: evaluating a preservation planning procedure

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Near-duplicate keyframe retrieval with visual keywords and semantic context

Proceedings of the 6th ACM international conference on Image and video retrieval
PaperDiff: A Script Independent Automatic Method for Finding the Text Differences Between Two Document Images

DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
OpenIMAJ and ImageTerrier: Java libraries and tools for scalable multimedia analysis and indexing of images

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning

IEEE Transactions on Multimedia
Image quality assessment: from error visibility to structural similarity

IEEE Transactions on Image Processing
ORB: An efficient alternative to SIFT or SURF

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Quality assurance for document image collections in digital preservation

ACIVS'12 Proceedings of the 14th international conference on Advanced Concepts for Intelligent Vision Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an evaluation of different methods for automatic duplicate detection in digitized collections. These approaches are meant to support quality assurance and decision making for long term preservation of digital content in libraries and archives. In this paper we demonstrate advantages and drawbacks of different approaches. Our goal is to select the most efficient method which satisfies the digital preservation requirements for duplicate detection in digital document image collections. Workflows of different complexity were designed in order to demonstrate possible duplicate detection approaches. Assessment of individual approaches is based on workflow simplicity, detection accuracy and acceptable performance, since image processing methods typically require significant computation. Applied image processing methods create expert knowledge that facilitates decision making for long term preservation. We employ AI technologies like expert rules and clustering for inferring explicit knowledge on the content of the digital collection. A statistical analysis of the aggregated information and the qualitative analysis of the aggregated knowledge are presented in the evaluation part of the paper.