Merging R-Trees: Efficient Strategies for Local Bulk Insertion

  • Authors:
  • Li Chen;Rupesh Choubey;Elke A. Rundensteiner

  • Affiliations:
  • Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA 01609-2280 lichen@cs.wpi.edu;Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA 01609-2280 rupesh@cs.wpi.edu;Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA 01609-2280 rundenst@cs.wpi.edu

  • Venue:
  • Geoinformatica
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A lot of recent work has focussed on bulk loading of data into multidimensional index structures in order to efficiently construct such structures for large datasets. In this paper, we address this problem with particular focus on R-trees—which are an important class of index structures used widely in commercial database systems. We propose a new technique, which as opposed to the current technique of inserting data one by one, bulk inserts entire new datasets into an active R-tree. This technique, called STLT (for small-tree-large-tree), considers the new dataset as an R-tree itself (small tree), identifies and prepares a suitable location in the original R-tree (large tree) for insertion, and lastly performs the insert of the small tree into the large tree. Besides an analytical cost model of STLT, extensive experimental studies both on synthetic and real GIS data sets are also reported. These experiments not only compare STLT against the conventional technique, but also evaluate the suitability and limitations of STLT under different conditions, such as varying buffer sizes, ratio between existing and new data sizes, and skewness of new data with respect to the whole spatial region. We find that STLT does much better (in average, about 65%) than the existing technique for skewed datasets as well for large sizes of both the large tree and the small tree in terms of insertion time, while keeping comparable query tree quality. STLT consistently outperforms the alternate technique in all other circumstances in terms of bulk insertion time, especially, even up to 2,000% for the cases when the area of new data sets covers up to 4% of the global region covered by the existing index tree; however, at the cost of a deteriorating resulting tree quality.