Genetic programming: on the programming of computers by means of natural selection
Genetic programming: on the programming of computers by means of natural selection
Machine Learning
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
A Representation for the Adaptive Generation of Simple Sequential Programs
Proceedings of the 1st International Conference on Genetic Algorithms
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Genetic Programming IV: Routine Human-Competitive Machine Intelligence
Genetic Programming IV: Routine Human-Competitive Machine Intelligence
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Ontology Matching
Strongly typed genetic programming
Evolutionary Computation
Replica identification using genetic programming
Proceedings of the 2008 ACM symposium on Applied computing
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
On active learning of record matching packages
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A Genetic Programming Approach to Record Deduplication
IEEE Transactions on Knowledge and Data Engineering
Active learning of expressive linkage rules for the web of data
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Discovering keys in RDF/OWL dataset with KD2R
Proceedings of the 2nd International Workshop on Open Data
Discovering linkage points over web data
Proceedings of the VLDB Endowment
Active learning of expressive linkage rules using genetic programming
Web Semantics: Science, Services and Agents on the World Wide Web
An automatic key discovery approach for data linking
Web Semantics: Science, Services and Agents on the World Wide Web
Hi-index | 0.00 |
A central problem in data integration and data cleansing is to find entities in different data sources that describe the same real-world object. Many existing methods for identifying such entities rely on explicit linkage rules which specify the conditions that entities must fulfill in order to be considered to describe the same real-world object. In this paper, we present the GenLink algorithm for learning expressive linkage rules from a set of existing reference links using genetic programming. The algorithm is capable of generating linkage rules which select discriminative properties for comparison, apply chains of data transformations to normalize property values, choose appropriate distance measures and thresholds and combine the results of multiple comparisons using non-linear aggregation functions. Our experiments show that the GenLink algorithm outperforms the state-of-the-art genetic programming approach to learning linkage rules recently presented by Carvalho et. al. and is capable of learning linkage rules which achieve a similar accuracy as human written rules for the same problem.