Distributed query evaluation on semistructured data

Authors:
Dan Suciu
Affiliations:
University of Washington, Seattle, WA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2002

Citing 28
Cited 33

Languages that capture complexity classes

SIAM Journal on Computing
Parallel evaluation of the transitive closure of a database relation

International Journal of Parallel Programming
Algorithms for finding patterns in strings

Handbook of theoretical computer science (vol. A)
Limits to parallel computation: P-completeness theory

Limits to parallel computation: P-completeness theory
Incremental maintenance of views with duplicates

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A query language and optimization techniques for unstructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
STRUDEL: a Web site management system

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Semistructured data

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A query language for XML

WWW '99 Proceedings of the eighth international conference on World Wide Web
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
Querying the World Wide Web

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Distributed Algorithms

Distributed Algorithms
Communication and Concurrency

Communication and Concurrency
Database System Concepts

Database System Concepts
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Querying Semistructured Heterogeneous Information

DOOD '95 Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases
Optimizing Regular Path Expressions Using Graph Schemas

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Object Exchange Across Heterogeneous Information Sources

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Querying Semi-Structured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Adding Structure to Unstructured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
W3QS: A Query System for the World-Wide Web

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Query Decomposition and View Maintenance for Query Languages for Unstructured Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Quilt: An XML Query Language for Heterogeneous Data Sources

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Programming Constructs for Unstructured Data

DBLP-5 Proceedings of the Fifth International Workshop on Database Programming Languages
UnQL: a query language and algebra for semistructured data based on structural recursion

The VLDB Journal — The International Journal on Very Large Data Bases
The complexity of relational query languages (Extended Abstract)

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Answering Regular Path Queries Using Views

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Using Agents for Concurrent Querying of Web-Like Databases via a Hyper-Set-Theoretic Approach

PSI '02 Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics: Akademgorodok, Novosibirsk, Russia
XPath lookup queries in P2P networks

Proceedings of the 6th annual ACM international workshop on Web information and data management
Distributed evaluation of generalized path queries

Proceedings of the 2005 ACM symposium on Applied computing
Peer-to-peer management of XML data: issues and research challenges

ACM SIGMOD Record
Lightweight multigranularity locking for transaction management in XML database systems

Journal of Systems and Software
IrisNet: an internet-scale architecture for multimedia sensors

Proceedings of the 13th annual ACM international conference on Multimedia
Using partial evaluation in distributed query evaluation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Storing and retrieving XPath fragments in structured P2P networks

Data & Knowledge Engineering - Special issue: WIDM 2004
On-the-fly data integration models for biological databases

Proceedings of the 2007 ACM symposium on Applied computing
Distributed query evaluation with performance guarantees

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Integrating and querying distributed XML data via XLink

Information Systems
Querying and monitoring distributed business processes

Proceedings of the VLDB Endowment
Localization of distributed data in a CORBA-based environment

WSEAS Transactions on Information Science and Applications
Fault-tolerant computation of distributed regular path queries

Theoretical Computer Science
Parallelization of XPath queries using multi-core processors: challenges and experiences

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
An on demand data integration model for biological databases

International Journal of Data Mining and Bioinformatics
A framework for semantic querying of distributed data-graphs via information granules

ISC '07 Proceedings of the 10th IASTED International Conference on Intelligent Systems and Control
Statistics-based parallelization of XPath queries in shared memory systems

Proceedings of the 13th International Conference on Extending Database Technology
Satisfiability and containment problem of structural recursions with conditions

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Scaling XML query processing: distribution, localization and pruning

Distributed and Parallel Databases
Database-centric programming for wide-area sensor systems

DCOSS'05 Proceedings of the First IEEE international conference on Distributed Computing in Sensor Systems
Handling interlinked XML instances on the web

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Optimizing monitoring queries over distributed data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Querying along XLinks in XPath/XQuery: situation, applications, perspectives

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Enhanced regular path queries on semistructured databases

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Towards effective partition management for large graphs

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Performance guarantees for distributed reachability queries

Proceedings of the VLDB Endowment
Partial Evaluation for Distributed XPath Query Processing and Beyond

ACM Transactions on Database Systems (TODS)
Distributed multi-source regular path queries

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Processing global XQuery queries based on static query decomposition

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Insertion and querying mechanism for a distributed XML database system

Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies
Efficient query evaluation on distributed graphs with Hadoop environment

Proceedings of the Fourth Symposium on Information and Communication Technology
Minimizing data transfers for regular reachability queries on distributed graphs

Proceedings of the Fourth Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semistructured data is modeled as a rooted, labeled graph. The simplest kinds of queries on such data are those which traverse paths described by regular path expressions. More complex queries combine several regular path expressions, with complex data restructuring, and with sub-queries. This article addresses the problem of efficient query evaluation on distributed, semistructured databases. In our setting, the nodes of the database are distributed over a fixed number of sites, and the edges are classified into local (with both ends in the same site) and cross edges (with ends in two distinct sites). Efficient evaluation in this context means that the number of communication steps is fixed (independent on the data or the query), and that the total amount of data sent depends only on the number of cross links and of the size of the query's result. We give such algorithms in three different settings. First, for the simple case of queries consisting of a single regular expression; second, for all queries in a calculus for graphs based on structural recursion which in addition to regular path expressions can perform nontrivial restructuring of the graph; and third, for a class of queries we call select-where queries that combine pattern matching and regular path expressions with data restructuring and subqueries. This article also includes a discussion on how these methods can be used to derive efficient view maintenance algorithms.