A parallel index for semistructured data

  • Authors:
  • Brian F. Cooper;Neal Sample;Moshe Shadmon

  • Affiliations:
  • Stanford University, Stanford, CA;Stanford University, Stanford, CA;RightOrder Inc., San Jose, CA

  • Venue:
  • Proceedings of the 2002 ACM symposium on Applied computing
  • Year:
  • 2002
  • Indexing Open Schemas

    Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems

Quantified Score

Hi-index 0.00

Visualization

Abstract

Database systems are increasingly being used to manage semistructured data, which may not have a fixed structure or set of relationships between data items. Indexes which use tree structures to manage semistructured data become unbalanced and difficult to parallelize due to the complex nature of the data. We propose a mechanism by which an unbalanced vertical tree is managed in a balanced way by additional layers of horizontal index. Then, the vertical tree can be partitioned among parallel computing nodes in a balanced fashion. We discuss how to construct, search and update such a horizontal structure using the example of a Patricia trie index. We also present simulation results that demonstrate the speedup offered by such parallelism, for example, with three-way parallelism, our techniques can provide almost a factor of three speedup.