A parallel index for semistructured data

Authors:
Brian F. Cooper;Neal Sample;Moshe Shadmon
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;RightOrder Inc., San Jose, CA
Venue:
Proceedings of the 2002 ACM symposium on Applied computing
Year:
2002

Citing 11
Cited 1

Parallel database systems: the future of high performance database systems

Communications of the ACM
Lore: a database management system for semistructured data

ACM SIGMOD Record
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Principles of distributed database systems (2nd ed.)

Principles of distributed database systems (2nd ed.)
Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Querying Semi-Structured Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
Generalized Search Trees for Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Clustering Techniques for Minimizing External Path Length

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Weighted voting for replicated data

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles

Indexing Open Schemas

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database systems are increasingly being used to manage semistructured data, which may not have a fixed structure or set of relationships between data items. Indexes which use tree structures to manage semistructured data become unbalanced and difficult to parallelize due to the complex nature of the data. We propose a mechanism by which an unbalanced vertical tree is managed in a balanced way by additional layers of horizontal index. Then, the vertical tree can be partitioned among parallel computing nodes in a balanced fashion. We discuss how to construct, search and update such a horizontal structure using the example of a Patricia trie index. We also present simulation results that demonstrate the speedup offered by such parallelism, for example, with three-way parallelism, our techniques can provide almost a factor of three speedup.