Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Towards distributed processing of RDF path queries
International Journal of Web Engineering and Technology
Efficient Selection and Integration of Data Sources for Answering Semantic Web Queries
ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing
Hexastore: sextuple indexing for semantic web data management
Proceedings of the VLDB Endowment
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Hermes: Data Web search on a pay-as-you-go integration infrastructure
Web Semantics: Science, Services and Agents on the World Wide Web
A Scalable Indexing Mechanism for Ontology-Based Information Integration
WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Using reformulation trees to optimize queries over distributed heterogeneous sources
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Hi-index | 0.00 |
In recent years, there has been an explosion of publicly available RDF and OWL data sources. In order to effectively and quickly answer queries in such an environment, we present an approach to identifying the potentially relevant Semantic Web data sources using query rewritings and a term index. We demonstrate that such an approach must carefully handle query goals that lack constants; otherwise the algorithm may identify many sources that do not contribute to eventual answers. This is because the term index only indicates if URIs are present in a document, and specific answers to a subgoal cannot be calculated until the source is physically accessed - an expensive operation given disk/network latency. We present an algorithm that, given a set of query rewritings that accounts for ontology heterogeneity, incrementally selects and processes sources in order to maintain selectivity. Once sources are selected, we use an OWL reasoner to answer queries over these sources and their corresponding ontologies. We present the results of experiments using both a synthetic data set and a subset of the real-world Billion Triple Challenge data.