A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Communications of the ACM
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Modern Information Retrieval
Machine Learning
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Finding similar identities among objects from multiple web sources
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring similarity between collection of values
Proceedings of the 6th annual ACM international workshop on Web information and data management
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
A strategy for allowing meaningful and comparable scores in approximate matching
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval
Introduction to Information Retrieval
Estimating recall and precision for vague queries in databases
CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Hi-index | 0.00 |
Digital libraries contain collections of digital objects, acquired from different sources, which can be represented through several metadata standards. These metadata are heterogeneous both in content and in structure. This paper presents an approach that identifies duplicated metadata records referring to objects from digital libraries. We propose similarity functions designed for the digital library domain that compare the content of metadata. The results of experiments show that the proposed functions, compared to three different baselines, improve the quality of metadata deduplication from 0.64 to 31.5% using an algorithm with linear complexity to compare authors' names.