Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Adaptive sorted neighborhood methods for efficient record linkage
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
Combining a Logical and a Numerical Method for Data Reconciliation
Journal on Data Semantics XII
RiMOM: A Dynamic Multistrategy Ontology Alignment Framework
IEEE Transactions on Knowledge and Data Engineering
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
Ontology matching with semantic verification
Web Semantics: Science, Services and Agents on the World Wide Web
AgreementMaker: efficient matching for large real-world schemas and ontologies
Proceedings of the VLDB Endowment
Discovering and Maintaining Links on the Web of Data
ISWC '09 Proceedings of the 8th International Semantic Web Conference
RKBExplorer.com: a knowledge driven infrastructure for linked data providers
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Trie-join: efficient trie-based string similarity joins with edit-distance constraints
Proceedings of the VLDB Endowment
When owl: sameAs isn't the same: an analysis of identity in linked data
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
A self-training approach for resolving object coreference on the semantic web
Proceedings of the 20th international conference on World wide web
Efficient exact edit similarity query processing with the asymmetric signature scheme
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Fast-join: An efficient method for fuzzy token matching based string similarity join
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Zhishi.me: weaving chinese linking open data
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Ontology-driven automatic entity disambiguation in unstructured text
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Mining information for instance unification
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Leveraging terminological structure for object reconciliation
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Leveraging unlabeled data to scale blocking for record linkage
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Domain-Independent Entity Coreference for Linking Ontology Instances
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Hi-index | 0.00 |
One challenge for the Semantic Web is to scalably establish high quality owl: same As links between co referent ontology instances in different data sources, traditional approaches that exhaustively compare every pair of instances do not scale well to large datasets. In this paper, we propose a pruning-based algorithm for reducing the complexity of entity co reference. First, we discard candidate pairs of instances that are not sufficiently similar to the same pool of other instances. A sigmoid function based thresholding method is proposed to automatically adjust the threshold for such commonality on-the-fly. In our prior work, each instance is associated with a context graph consisting of neighboring RDF nodes. In this paper, we speed up the comparison for a single pair of instances by pruning insignificant context in the graph, this is accomplished by evaluating its potential contribution to the final similarity measure. We evaluate our system on three Semantic Web instance categories. We verify the effectiveness of our thresholding and context pruning methods by comparing to nine state-of-the-art systems. We show that our algorithm frequently outperforms those systems with a runtime speedup factor of 18 to 24 while maintaining competitive F1-scores. For datasets of up to 1 million instances, this translates to as much as 370 hours improvement in runtime.