Bulk Operations for Space-Partitioning Trees

  • Authors:
  • Thanaa M. Ghanem;Rahul Shah;Mohamed F. Mokbel;Walid G. Aref;Jeffrey S. Vitter

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The emergence of extensible index structures, e.g.,GiST (Generalized Search Tree) and SP-GiST (Space-PartitioningGeneralized Search Tree), calls for a set ofextensible algorithms to support different operations (e.g.,insertion, deletion, and search). Extensible bulk operations(e.g., bulk loading and bulk insertion) are of the same importanceand need to be supported in these index engines.In this paper, we propose two extensible buffer-based algorithmsfor bulk operations in the class of space-partitioningtrees; a class of hierarchical data structures that recursivelydecompose the space into disjoint partitions. Themain idea of these algorithms is to build an in-memory treeof the target space-partitioning index. Then, data itemsare recursively partitioned into disk-based buffers usingthe in-memory tree. Although the second algorithm is designedfor bulk insertion, it can be used in bulk loading aswell. The proposed extensible algorithms are implementedinside SP-GiST; a framework for supporting the class ofspace-partitioning trees. Both algorithms have I/O boundO(NH/B), whereN is the number of data items to be bulkloaded/inserted, B is the number of tree nodes that can fitin one disk page, H is the tree height in terms of pages afterapplying a clustering algorithm. Experimental results areprovided to show the scalability and applicability of the proposedalgorithms for the class of space-partitioning trees.A comparison of the two proposed algorithms shows thatthe first algorithm performs better in case of bulk loading.However the second algorithm is more general and can beused for efficient bulk insertion.