An automatic key discovery approach for data linking

Authors:
Nathalie Pernelle;Fatiha Saïs;Danai Symeonidou
Affiliations:
-;-;-
Venue:
Web Semantics: Science, Services and Agents on the World Wide Web
Year:
2013

Citing 22
Cited 0

A knowledge-based approach for duplicate elimination in data cleaning

Information Systems - Data extraction, cleaning and reconciliation
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Large-Scale Deduplication with Constraints Using Dedupalog

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Combining a Logical and a Numerical Method for Data Reconciliation

Journal on Data Semantics XII
Learning blocking schemes for record linkage

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Discovering and Maintaining Links on the Web of Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Google fusion tables: web-centered data management and collaboration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A self-training approach for resolving object coreference on the semantic web

Proceedings of the 20th international conference on World wide web
Automatically generating data linkages using a domain-independent candidate selection approach

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Detecting Abnormal Semantic Web Data Using Semantic Dependency

ICSC '11 Proceedings of the 2011 IEEE Fifth International Conference on Semantic Computing
KD2R: a key discovery method for semantic reference reconciliation

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
PARIS: probabilistic alignment of relations, instances, and schema

Proceedings of the VLDB Endowment
LIMES: a time-efficient approach for large-scale link discovery on the web of data

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Unsupervised learning of link discovery configuration

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
EAGLE: efficient active learning of link specifications using genetic programming

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Learning expressive linkage rules using genetic programming

Proceedings of the VLDB Endowment
Keys and pseudo-keys detection for web datasets cleansing and interlinking

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Ontology Matching: State of the Art and Future Challenges

IEEE Transactions on Knowledge and Data Engineering
Data Linking for the Semantic Web

International Journal on Semantic Web & Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of Linked Data, different kinds of semantic links can be established between data. However when data sources are huge, detecting such links manually is not feasible. One of the most important types of links, the identity link, expresses that different identifiers refer to the same real world entity. Some automatic data linking approaches use keys to infer identity links, nevertheless this kind of knowledge is rarely available. In this work we propose KD2R, an approach which allows the automatic discovery of composite keys in RDF data sources that may conform to different schemas. We only consider data sources for which the Unique Name Assumption is fulfilled. The obtained keys are correct with respect to the RDF data sources in which they are discovered. The proposed algorithm is scalable since it allows the key discovery without having to scan all the data. KD2R has been tested on real datasets of the international contest OAEI 2010 and on datasets available on the web of data, and has obtained promising results.