Merging R-Trees: Efficient Strategies for Local Bulk Insertion

Authors:
Li Chen;Rupesh Choubey;Elke A. Rundensteiner
Affiliations:
Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA 01609-2280 lichen@cs.wpi.edu;Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA 01609-2280 rupesh@cs.wpi.edu;Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA 01609-2280 rundenst@cs.wpi.edu
Venue:
Geoinformatica
Year:
2002

Citing 26
Cited 6

Batch insertion for tree structured file organizations—improving differential database representation

Information Systems
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Efficient query processing in geographic information systems

Efficient query processing in geographic information systems
Algorithms for loading parallel grid files

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Cubetree: organization of and bulk incremental updates on the data cube

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
Bulk-insertions into R-trees using the small-tree-large-tree approach

Proceedings of the 6th ACM international symposium on Advances in geographic information systems
Direct spatial search on pictorial databases using packed R-trees

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Concurrency Control in B-Trees with Batch Updates

IEEE Transactions on Knowledge and Data Engineering
Efficient Bulk-Loading of Gridfiles

IEEE Transactions on Knowledge and Data Engineering
Programming with Logical Queries, Bulk Updates, and Hypothetical Reasoning

IEEE Transactions on Knowledge and Data Engineering
Efficient Algorithms for Maintenance of Large Database

Proceedings of the Fourth International Conference on Data Engineering
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
The Effect of Buffering on the Performance of R-Trees

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
A Robust Multi-Attribute Search Structure

Proceedings of the Fifth International Conference on Data Engineering
Optimization Issues in R-tree Construction (Extended Abstract)

IGIS '94 Proceedings of the International Workshop on Advanced Information Systems: Geographic Information Systems
A Generic Approach to Bulk Loading Multidimensional Index Structures

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract)

WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
Bulk Loading into an OODB: A Performance Study

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Cost Model for Estimating the Performance of Spatial Joins Using R-trees

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
New Linear Node Splitting Algorithm for R-trees

SSD '97 Proceedings of the 5th International Symposium on Advances in Spatial Databases

Incorporating Updates in Domain Indexes: Experiences with Oracle Spatial R-trees

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Bulk Operations for Space-Partitioning Trees

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Bulk construction of dynamic clustered metric trees

Knowledge and Information Systems
Historical index structure for reducing insertion and search cost in LBS

Journal of Systems and Software
Efficient bulk-insertion for content-based video indexing

PKAW'10 Proceedings of the 11th international conference on Knowledge management and acquisition for smart systems and services
Modern B-Tree Techniques

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

A lot of recent work has focussed on bulk loading of data into multidimensional index structures in order to efficiently construct such structures for large datasets. In this paper, we address this problem with particular focus on R-trees—which are an important class of index structures used widely in commercial database systems. We propose a new technique, which as opposed to the current technique of inserting data one by one, bulk inserts entire new datasets into an active R-tree. This technique, called STLT (for small-tree-large-tree), considers the new dataset as an R-tree itself (small tree), identifies and prepares a suitable location in the original R-tree (large tree) for insertion, and lastly performs the insert of the small tree into the large tree. Besides an analytical cost model of STLT, extensive experimental studies both on synthetic and real GIS data sets are also reported. These experiments not only compare STLT against the conventional technique, but also evaluate the suitability and limitations of STLT under different conditions, such as varying buffer sizes, ratio between existing and new data sizes, and skewness of new data with respect to the whole spatial region. We find that STLT does much better (in average, about 65%) than the existing technique for skewed datasets as well for large sizes of both the large tree and the small tree in terms of insertion time, while keeping comparable query tree quality. STLT consistently outperforms the alternate technique in all other circumstances in terms of bulk insertion time, especially, even up to 2,000% for the cases when the area of new data sets covers up to 4% of the global region covered by the existing index tree; however, at the cost of a deteriorating resulting tree quality.