Query optimization for ontology-based information integration

Authors:
Yingjie Li;Jeff Heflin
Affiliations:
Lehigh University, Bethlehem, PA, USA;Lehigh University, Bethlehem, PA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 7
Cited 1

Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Towards distributed processing of RDF path queries

International Journal of Web Engineering and Technology
Efficient Selection and Integration of Data Sources for Answering Semantic Web Queries

ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
GRIN: a graph based RDF index

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Hermes: Data Web search on a pay-as-you-go integration infrastructure

Web Semantics: Science, Services and Agents on the World Wide Web
A Scalable Indexing Mechanism for Ontology-Based Information Integration

WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Using reformulation trees to optimize queries over distributed heterogeneous sources

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, there has been an explosion of publicly available RDF and OWL data sources. In order to effectively and quickly answer queries in such an environment, we present an approach to identifying the potentially relevant Semantic Web data sources using query rewritings and a term index. We demonstrate that such an approach must carefully handle query goals that lack constants; otherwise the algorithm may identify many sources that do not contribute to eventual answers. This is because the term index only indicates if URIs are present in a document, and specific answers to a subgoal cannot be calculated until the source is physically accessed - an expensive operation given disk/network latency. We present an algorithm that, given a set of query rewritings that accounts for ontology heterogeneity, incrementally selects and processes sources in order to maintain selectivity. Once sources are selected, we use an OWL reasoner to answer queries over these sources and their corresponding ontologies. We present the results of experiments using both a synthetic data set and a subset of the real-world Billion Triple Challenge data.