Faster path indexes for search in XML data

Authors:
Nils Grimsmo
Affiliations:
Norwegian University of Science and Technology, Trondheim, Norway
Venue:
ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Year:
2008

Citing 37
Cited 2

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Access support relations: an indexing method for object bases

Information Systems - Data bases: their creation, management, and utilization
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
XRel: a path-based approach to storage and retrieval of XML documents using relational databases

ACM Transactions on Internet Technology (TOIT)
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Accelerating XPath location steps

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Indexing Techniques for Queries on Nested Objects

IEEE Transactions on Knowledge and Data Engineering
Representative Objects: Concise Representations of Semistructured, Hierarchial Data

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
XBench - A Family of Benchmarks for XML DBMSs

Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb on Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers
The XOO7 Benchmark

Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb on Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers
The Michigan Benchmark: A Microbenchmark for XML Query Processing Systems

Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb on Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers
XMach-1: A Benchmark for XML Data Management

Datenbanksysteme in Büro, Technik und Wissenschaft (BTW), 9. GI-Fachtagung,
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
The XML benchmark project

The XML benchmark project
ViST: a dynamic index method for querying XML data by tree structures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
PRIX: Indexing And Querying XML Using Prüfer Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Ctree: a compact tree for indexing XML data

Proceedings of the 6th annual ACM international workshop on Web information and data management
Vectorizing and Querying Large XML Repositories

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
On the Sequencing of Tree Structures for XML Indexing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient algorithms for processing XPath queries

ACM Transactions on Database Systems (TODS)
Benefits of path summaries in an XML query optimizer supporting multiple access methods

VLDB '05 Proceedings of the 31st international conference on Very large data bases
From region encoding to extended dewey: on efficient processing of XML twig pattern matching

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Schema-conscious XML indexing

Information Systems
An efficient index structure for XML based on generalized suffix tree

Information Systems
Efficient structural joins on indexed XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Holistic twig joins on indexed XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Multiple schema based XML indexing

ICCNMC'05 Proceedings of the Third international conference on Networking and Mobile Computing

XPath query processing improvements

Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Searching web data: An entity retrieval and high-performance indexing model

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article describes how to implement efficient memory resident path indexes for semi-structured data. Two techniques are introduced, and they are shown to be significantly faster than previous methods when facing path queries using the descendant axis and wild-cards. The first is conceptually simple and combines inverted lists, selectivity estimation, hit expansion and brute force search. The second uses suffix trees with additional statistics and multiple entry points into the query. The entry points are partially evaluated in an order based on estimated cost until one of them is complete. Many path index implementations are tested, using paths generated both from statistical models and DTDs.