Comparing data summaries for processing live queries over Linked Data

Authors:
Jürgen Umbrich;Katja Hose;Marcel Karnstedt;Andreas Harth;Axel Polleres
Affiliations:
Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;Max-Planck-Institut für Informatik, Saarbrücken, Germany;Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;Institute AIFB, Karlsruhe Institute of Technology, Karlsruhe, Germany;Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
Venue:
World Wide Web
Year:
2011

Citing 44
Cited 9

A federated architecture for information management

ACM Transactions on Information Systems (TOIS)
Performance comparison of extendible hashing and linear hashing techniques

ACM SIGSMALL/PC Notes
Measuring index quality using random walks on the Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
EDUTELLA: a P2P networking infrastructure based on RDF

Proceedings of the 11th international conference on World Wide Web
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Database System Implementation

Database System Implementation
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Three Implementations of SquishQL, a Simple RDF Query Language

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Approximate query processing using wavelets

The VLDB Journal — The International Journal on Very Large Data Bases
Routing Indices For Peer-to-Peer Systems

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Index structures and algorithms for querying distributed RDF repositories

Proceedings of the 13th international conference on World Wide Web
RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network

Proceedings of the 13th international conference on World Wide Web
Processing complex RDF queries over P2P networks

Proceedings of the 2005 ACM workshop on Information retrieval in peer-to-peer networks
Optimized Index Structures for Querying RDF from the Web

LA-WEB '05 Proceedings of the Third Latin American Web Congress
Tree Vector Indexes: Efficient Range Queries for Dynamic Content on Peer-to-Peer Networks

PDP '06 Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
ISOMER: Consistent Histogram Construction Using Query Feedback

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Scalable p2p based RDF querying

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Distributed Data Summaries for Approximate Query Processing in PDMS

IDEAS '06 Proceedings of the 10th International Database Engineering and Applications Symposium
Towards a scalable search and query engine for the web

Proceedings of the 16th international conference on World Wide Web
GridVine: An Infrastructure for Peer Information Management

IEEE Internet Computing
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Towards distributed processing of RDF path queries

International Journal of Web Engineering and Technology
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Sindice.com: a document-oriented lookup index for open linked data

International Journal of Metadata, Semantics and Ontologies
RDFStats - An Extensible RDF Statistics Generator and Library

DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
Executing SPARQL Queries over the Web of Linked Data

ISWC '09 Proceedings of the 8th International Semantic Web Conference
HyperCuP: hypercubes, ontologies, and efficient search on peer-to-peer networks

AP2PC'02 Proceedings of the 1st international conference on Agents and peer-to-peer computing
SomeRDFS in the semantic web

Journal on data semantics VIII
Data summaries for on-demand queries over linked data

Proceedings of the 19th international conference on World wide web
Processing rank-aware queries in P2P systems

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Querying distributed RDF data sources with SPARQL

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Semantic sitemaps: efficient and flexible access to datasets on the semantic web

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
On using histograms as routing indexes in peer-to-peer systems

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
On constructing small worlds in unstructured peer-to-peer systems

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
A node indexing scheme for web entity retrieval

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II

Database techniques for linked data management

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Towards benefit-based RDF source selection for SPARQL queries

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Pay-as-you-go data integration for linked data: opportunities, challenges and architectures

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Freshening up while staying fast: towards hybrid SPARQL queries

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Improving the recall of live linked data querying through reasoning

RR'12 Proceedings of the 6th international conference on Web Reasoning and Rule Systems
SPLODGE: systematic generation of SPARQL benchmark queries for linked open data

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Hybrid SPARQL queries: fresh vs. fast results

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Structure inference for linked data sources using clustering

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Colledge: a vision of collaborative knowledge networks

Proceedings of the 2nd International Workshop on Semantic Search over the Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

A growing amount of Linked Data--graph-structured data accessible at sources distributed across the Web--enables advanced data integration and decision-making applications. Typical systems operating on Linked Data collect (crawl) and pre-process (index) large amounts of data, and evaluate queries against a centralised repository. Given that crawling and indexing are time-consuming operations, the data in the centralised index may be out of date at query execution time. An ideal query answering system for querying Linked Data live should return current answers in a reasonable amount of time, even on corpora as large as the Web. In such a live query system source selection--determining which sources contribute answers to a query--is a crucial step. In this article we propose to use lightweight data summaries for determining relevant sources during query evaluation. We compare several data structures and hash functions with respect to their suitability for building such summaries, stressing benefits for queries that contain joins and require ranking of results and sources. We elaborate on join variants, join ordering and ranking. We analyse the different approaches theoretically and provide results of an extensive experimental evaluation.