Fast answering of XPath query workloads on web collections

Authors:
Mariano P. Consens;Flavio Rizzolo
Affiliations:
University of Toronto;University of Toronto
Venue:
XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Year:
2007

Citing 19
Cited 3

Three partition refinement algorithms

SIAM Journal on Computing
Optimizing queries on files

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Finding Regular Simple Paths in Graph Databases

SIAM Journal on Computing
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Representative Objects: Concise Representations of Semistructured, Hierarchial Data

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Index Structures for Path Expressions

ICDT '99 Proceedings of the 7th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An efficient algorithm for computing bisimulation equivalence

Theoretical Computer Science
Approximate XML query answers

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Vectorizing and Querying Large XML Repositories

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
XCluster Synopses for Structured XML Content

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
The Wikipedia XML corpus

ACM SIGIR Forum
XCheck: a platform for benchmarking XQuery engines

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
Path queries on compressed XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MemBeR: a micro-benchmark repository for XQuery

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

XML Structural Summaries

Proceedings of the VLDB Endowment
Exploring XML web collections with DescribeX

ACM Transactions on the Web (TWEB)
ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several web applications (such as processing RSS feeds or web service messages) rely on XPath-based data manipulation tools. Web developers need to use XPath queries effectively on increasingly larger web collections containing hundreds of thousands of XML documents. Even when tasks only need to deal with a single document at a time, developers benefit from understanding the behaviour of XPath expressions across multiple documents (e.g., what will a query return when run over the thousands of hourly feeds collected during the last few months?). Dealing with the (highly variable) structure of such web collections poses additional challenges. This paper introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, enabling the efficient evaluation of XPath workloads (supporting all the axes and language constructs in XPath). Experiments validate that DescribeX enables existing document-at-a-time XPath tools to scale up to multi-gigabyte XML collections.