Enhancing source selection for live queries over linked data via query log mining

Authors:
Yuan Tian;Jürgen Umbrich;Yong Yu
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;Shanghai Jiao Tong University, Shanghai, China
Venue:
JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Year:
2011

Citing 7
Cited 0

Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Executing SPARQL Queries over the Web of Linked Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Data summaries for on-demand queries over linked data

Proceedings of the 19th international conference on World wide web
An evaluation of approaches to federated query processing over linked data

Proceedings of the 6th International Conference on Semantic Systems
Linked data query processing strategies

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Zero-knowledge query planning for an iterator implementation of link traversal based query execution

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, Linked Data query engines execute SPARQL queries over a materialised repository which on the one hand, guarantees fast query answering but on the other hand requires time and resource consuming preprocessing steps. In addition, the materialised repositories have to deal with the ongoing challenge of maintaining the index which is --- given the size of the Web --- practically unfeasible. Thus, the results for a given SPARQL query are potentially out-dated. Recent approaches address the result freshness problem by answering a given query directly over dereferenced query relevant Web documents. Our work investigate the problem of an efficient selection of query relevant sources under this context. As a part of query optimization, source selection tries to estimate the minimum number of sources accessed in order to answer a query. We propose to summarize and index sources based on frequently appearing query graph patterns mined from query logs. We verify the applicability of our approach and empirically show that our approach significantly reduces the number of relevant sources estimated while keeping the overhead low.