Accuracy vs. Speed: Scalable Entity Coreference on the Semantic Web with On-the-Fly Pruning

Authors:
Dezhao Song;Jeff Heflin
Affiliations:
-;-
Venue:
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2012

Citing 23
Cited 0

Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Adaptive sorted neighborhood methods for efficient record linkage

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints

Proceedings of the VLDB Endowment
Combining a Logical and a Numerical Method for Data Reconciliation

Journal on Data Semantics XII
RiMOM: A Dynamic Multistrategy Ontology Alignment Framework

IEEE Transactions on Knowledge and Data Engineering
Creating relational data from unstructured and ungrammatical data sources

Journal of Artificial Intelligence Research
Ontology matching with semantic verification

Web Semantics: Science, Services and Agents on the World Wide Web
AgreementMaker: efficient matching for large real-world schemas and ontologies

Proceedings of the VLDB Endowment
Discovering and Maintaining Links on the Web of Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
RKBExplorer.com: a knowledge driven infrastructure for linked data providers

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Trie-join: efficient trie-based string similarity joins with edit-distance constraints

Proceedings of the VLDB Endowment
When owl: sameAs isn't the same: an analysis of identity in linked data

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
A self-training approach for resolving object coreference on the semantic web

Proceedings of the 20th international conference on World wide web
Efficient exact edit similarity query processing with the asymmetric signature scheme

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient similarity joins for near-duplicate detection

ACM Transactions on Database Systems (TODS)
Fast-join: An efficient method for fuzzy token matching based string similarity join

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Zhishi.me: weaving chinese linking open data

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Ontology-driven automatic entity disambiguation in unstructured text

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Mining information for instance unification

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Leveraging terminological structure for object reconciliation

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Leveraging unlabeled data to scale blocking for record linkage

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Domain-Independent Entity Coreference for Linking Ontology Instances

Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution

Quantified Score

Hi-index	0.00

Visualization

Abstract

One challenge for the Semantic Web is to scalably establish high quality owl: same As links between co referent ontology instances in different data sources, traditional approaches that exhaustively compare every pair of instances do not scale well to large datasets. In this paper, we propose a pruning-based algorithm for reducing the complexity of entity co reference. First, we discard candidate pairs of instances that are not sufficiently similar to the same pool of other instances. A sigmoid function based thresholding method is proposed to automatically adjust the threshold for such commonality on-the-fly. In our prior work, each instance is associated with a context graph consisting of neighboring RDF nodes. In this paper, we speed up the comparison for a single pair of instances by pruning insignificant context in the graph, this is accomplished by evaluating its potential contribution to the final similarity measure. We evaluate our system on three Semantic Web instance categories. We verify the effectiveness of our thresholding and context pruning methods by comparing to nine state-of-the-art systems. We show that our algorithm frequently outperforms those systems with a runtime speedup factor of 18 to 24 while maintaining competitive F1-scores. For datasets of up to 1 million instances, this translates to as much as 370 hours improvement in runtime.