Workload-aware trie indices for XML

Authors:
Yuqing Wu;Sofia Brenes;Hyungdae Yi
Affiliations:
Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 5
Cited 1

APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
TIMBER: A native XML database

The VLDB Journal — The International Journal on Very Large Data Bases
Structural properties of XPath fragments

Theoretical Computer Science - Database theory
Efficiently Querying Large XML Data Repositories: A Survey

IEEE Transactions on Knowledge and Data Engineering
A methodology for coupling fragments of XPath with structural indexes for XML documents

DBPL'07 Proceedings of the 11th international conference on Database programming languages

ASIC: algebra-based structural index comparison

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Well-designed indices can dramatically improve query performance. Including query workload information can produce indices that yield better overall throughput while balancing the space and performance trade-off at the core of index design. In the context of XML, structural indices have proven to be particularly effective in supporting XPath queries by capturing the structural correlation between data components in an XML document. In this paper, we propose a family of novel workload-aware indices by taking advantage of the disk-based Ρ[k]-Trie index framework, which indexes node pairs of an XML document to facilitate index-only evaluation plans. Our indices are designed to be optimal for answering frequent path queries in one index lookup and efficient for answering non-frequent path queries using an index-only plan. Experimental results prove that our indices outperform the APEX index in overall throughput and excel in answering non-frequent queries, queries with predicates, and queries that yield empty results.