Efficiently Querying Large XML Data Repositories: A Survey

Authors:
Gang Gou;Rada Chirkova
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 95
Cited 49

Efficient management of transitive relationships in large data and knowledge bases

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Lore: a database management system for semistructured data

ACM SIGMOD Record
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
XRel: a path-based approach to storage and retrieval of XML documents using relational databases

ACM Transactions on Internet Technology (TOIT)
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The design and performance evaluation of alternative XML storage strategies

ACM SIGMOD Record
Path materialization revisited: an efficient storage model for XML data

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Covering indexes for branching path queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
A general technique for querying XML documents using a relational database system

ACM SIGMOD Record
Estimating Answer Sizes for XML Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Query Optimization for XML

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Relational Storage and Retrieval of XML Documents

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Concurrency and Automata on Infinite Sequences

Proceedings of the 5th GI-Conference on Theoretical Computer Science
TAX: A Tree Algebra for XML

DBPL '01 Revised Papers from the 8th International Workshop on Database Programming Languages
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Anatomy of a native XML base management system

The VLDB Journal — The International Journal on Very Large Data Bases
The complexity of XPath query evaluation

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining order in a linked list

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
ViST: a dynamic index method for querying XML data by tree structures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XPRESS: a queriable compression for XML data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Stream processing of XPath queries with predicates

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XPath queries on streaming data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A comprehensive XQuery to SQL translation using dynamic interval encoding

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient Filtering of XML Documents with XPath Expressions

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Reachability and Distance Queries via 2-Hop Labels

SIAM Journal on Computing
Path sharing and predicate evaluation for high-performance XML filtering

ACM Transactions on Database Systems (TODS)
Accelerating XPath evaluation in any RDBMS

ACM Transactions on Database Systems (TODS)
PRIX: Indexing And Querying XML Using Prüfer Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Multiresolution Indexing of XML for Frequent Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
BLAS: an efficient XPath processing system

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient processing of XML twig queries with OR-predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Tree logical classes for efficient evaluation of XQuery

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On the integration of structure indexes and inverted lists

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
ORDPATHs: insert-friendly XML node labels

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Configurable indexing and ranking for XML information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
XQuery: An XML query language

IBM Systems Journal
Twig query processing over graph-structured XML data

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Virtual cursors for XML joins

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Efficient processing of XML twig patterns with parent child edges: a look-ahead approach

Proceedings of the thirteenth ACM international conference on Information and knowledge management
An evaluation of XML indexes for structural join

ACM SIGMOD Record
Processing XML streams with deterministic automata and stream indexes

ACM Transactions on Database Systems (TODS)
BOXes: Efficient Maintenance of Order-Based Labeling for Dynamic XML Data

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the Sequencing of Tree Structures for XML Indexing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Querying XML streams

The VLDB Journal — The International Journal on Very Large Data Bases
System RX: one part relational, one part XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On boosting holism in XML twig pattern matching using structural indexing techniques

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Towards an enterprise XML architecture

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient algorithms for processing XPath queries

ACM Transactions on Database Systems (TODS)
XSQ: A streaming XPath engine

ACM Transactions on Database Systems (TODS)
Efficient processing of XML path queries using the disk-based F&B Index

VLDB '05 Proceedings of the 31st international conference on Very large data bases
From region encoding to extended dewey: on efficient processing of XML twig pattern matching

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Tree-pattern queries on a lightweight XML processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient evaluation of XQuery over streaming data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic query optimization for XQuery over XML streams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Compact reachability labeling for graph-structured data

Proceedings of the 14th ACM international conference on Information and knowledge management
Optimizing cursor movement in holistic twig joins

Proceedings of the 14th ACM international conference on Information and knowledge management
Dual Labeling: Answering Graph Reachability Queries in Constant Time

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
An Efficient XPath Query Processor for XML Streams

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Index Structures for Matching XML Twigs Using Relational Query Processors

ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
Efficient algorithms for processing XPath queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A transducer-based XML query processor

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient structural joins on indexed XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Covering indexes for XML queries: bisimulation - simulation = negation

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Mixed mode XML query processing

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
From tree patterns to generalized tree patterns: on efficient evaluation of XQuery

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query processing for high-volume XML message brokering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Holistic twig joins on indexed XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Staircase join: teach a relational DBMS to watch its (axis) steps

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The BEA/XQRL streaming XQuery processor

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Schema-based scheduling of event processors and buffer minimization for queries on structured data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
XQuery on SQL hosts

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Indexing XML data stored in a relational database

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The BIRD numbering scheme for XML and tree databases – deciding and reconstructing tree relations using efficient arithmetic operations

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

Efficient algorithms for exact ranked twig-pattern matching over graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Structural summaries for efficient XML query processing

Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
XML data partitioning strategies to improve parallelism in parallel holistic twig joins

Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Performing grouping and aggregate functions in XML queries

Proceedings of the 18th international conference on World wide web
Extending path summary and region encoding for efficient structural query processing in native XML databases

Journal of Systems and Software
Improving XML schema matching performance using Prüfer sequences

Data & Knowledge Engineering
A methodology for coupling fragments of XPath with structural indexes for XML documents

Information Systems
Bitmap indexes for relational XML twig query processing

Proceedings of the 18th ACM conference on Information and knowledge management
Answering XML queries using materialized views revisited

Proceedings of the 18th ACM conference on Information and knowledge management
Workload-aware trie indices for XML

Proceedings of the 18th ACM conference on Information and knowledge management
Towards non-directional Xpath evaluation in a RDBMS

Proceedings of the 18th ACM conference on Information and knowledge management
A bi-labeling based XPath processing system

Information Systems
OTwig: An Optimised Twig Pattern Matching Approach for XML Databases

SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Feedback-driven result ranking and query refinement for exploring semi-structured data collections

Proceedings of the 13th International Conference on Extending Database Technology
Xbase: cloud-enabled information appliance for healthcare

Proceedings of the 13th International Conference on Extending Database Technology
Effective pruning for XML structural match queries

Data & Knowledge Engineering
Computing label-constraint reachability in graph databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Extracting a largest redundancy-free XML storage structure from an acyclic hypergraph in polynomial time

Information Systems
Towards unifying advances in twig join algorithms

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
LTIX: a compact level-based tree to index XML databases

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Element similarity measures in XML schema matching

Information Sciences: an International Journal
Reducing graph matching to tree matching for XML queries with ID references

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Linear computation of the maximum simultaneous forward and backward bisimulation for node-labeled trees

XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
TP+Output: modeling complex output information in XML twig pattern query

XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
Fast optimal twig joins

Proceedings of the VLDB Endowment
XPath query processing improvements

Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
LLS: level-based labeling scheme for XML databases

Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
On the parameterized complexity of the Multi-MCT and Multi-MCST problems

Journal of Combinatorial Optimization
XML data clustering: An overview

ACM Computing Surveys (CSUR)
Efficient evaluation of NOT-twig queries in tree-unaware relational databases

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
TwigTable: using semantics in XML twig pattern query processing

Journal on data semantics XV
Efficient storage and temporal query evaluation in hierarchical data archiving systems

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Semi-indexing semi-structured data in tiny space

Proceedings of the 20th ACM international conference on Information and knowledge management
Tractable XML data exchange via relations

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient database-driven evaluation of security clearance for federated access control of dynamic XML documents

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
A node indexing scheme for web entity retrieval

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Adding logical operators to tree pattern queries on graph-structured data

Proceedings of the VLDB Endowment
A highway-centric labeling approach for answering distance queries on large sparse graphs

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Examining the impact of data-access cost on XML twig pattern matching

Information Sciences: an International Journal
Stars on steroids: fast evaluation of multi-source star twig queries in RDBMS

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
SINBAD: towards structure-independent querying of common neighbors in XML databases

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Efficient indexing and querying over syntactically annotated trees

Proceedings of the VLDB Endowment
ANDES: efficient evaluation of NOT-twig queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
Energy and Latency Efficient Access of Wireless XML Stream

Journal of Database Management
Efficient processing of containment queries on nested sets

Proceedings of the 16th International Conference on Extending Database Technology
A survey on XML streaming evaluation techniques

The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing XML queries: Bitmapped materialized views vs. indexes

Information Systems
Stars on steroids: Fast evaluation of multi-source star twig queries in path materialization-based XML databases

Data & Knowledge Engineering
Hybrid query execution engine for large attributed graphs

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extensible Markup Language (XML) is emerging as a de facto standard for information exchange among various applications on the World-Wide Web. There has been a growing need for developing high-performance techniques to query large XML data repositories efficiently. One important problem in XML query processing is twig pattern matching , that is, finding in an XML data tree D all matches that satisfy a specified twig (or path) query pattern Q. In this survey we review, classify, and compare major techniques for twig pattern matching.Specifically, we consider two classes of major XML queryprocessing techniques: the relational approach and the native approach. The relational approach directly utilizes existing relational database systems to store and query XML data, which enables the use of all important techniques that have been developed for relational databases, while in the native approach, specialized storage and query-processing systems tailored for XML data are developed from scratch to further improve XML query performance. As implied by existing work, XML data querying and management are developing in the direction of integrating the relational approach with the native approach, which could result in higher query-processing performance and also significantly reduce system-reengineering costs.