A knowledge-based approach for duplicate elimination in data cleaning
Information Systems - Data extraction, cleaning and reconciliation
Levelwise Search and Borders of Theories in KnowledgeDiscovery
Data Mining and Knowledge Discovery
GORDIAN: efficient and scalable discovery of composite keys
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Large-Scale Deduplication with Constraints Using Dedupalog
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Combining a Logical and a Numerical Method for Data Reconciliation
Journal on Data Semantics XII
Learning blocking schemes for record linkage
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Discovering and Maintaining Links on the Web of Data
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Google fusion tables: web-centered data management and collaboration
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A self-training approach for resolving object coreference on the semantic web
Proceedings of the 20th international conference on World wide web
Automatically generating data linkages using a domain-independent candidate selection approach
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Detecting Abnormal Semantic Web Data Using Semantic Dependency
ICSC '11 Proceedings of the 2011 IEEE Fifth International Conference on Semantic Computing
KD2R: a key discovery method for semantic reference reconciliation
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
PARIS: probabilistic alignment of relations, instances, and schema
Proceedings of the VLDB Endowment
LIMES: a time-efficient approach for large-scale link discovery on the web of data
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Unsupervised learning of link discovery configuration
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
EAGLE: efficient active learning of link specifications using genetic programming
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Learning expressive linkage rules using genetic programming
Proceedings of the VLDB Endowment
Keys and pseudo-keys detection for web datasets cleansing and interlinking
EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Ontology Matching: State of the Art and Future Challenges
IEEE Transactions on Knowledge and Data Engineering
Data Linking for the Semantic Web
International Journal on Semantic Web & Information Systems
Hi-index | 0.00 |
In the context of Linked Data, different kinds of semantic links can be established between data. However when data sources are huge, detecting such links manually is not feasible. One of the most important types of links, the identity link, expresses that different identifiers refer to the same real world entity. Some automatic data linking approaches use keys to infer identity links, nevertheless this kind of knowledge is rarely available. In this work we propose KD2R, an approach which allows the automatic discovery of composite keys in RDF data sources that may conform to different schemas. We only consider data sources for which the Unique Name Assumption is fulfilled. The obtained keys are correct with respect to the RDF data sources in which they are discovered. The proposed algorithm is scalable since it allows the key discovery without having to scan all the data. KD2R has been tested on real datasets of the international contest OAEI 2010 and on datasets available on the web of data, and has obtained promising results.