Bulk construction of dynamic clustered metric trees

Authors:
Lior Aronovich;Israel Spiegler
Affiliations:
Tel Aviv University, Information Systems Department, Tel Aviv, Israel;Tel Aviv University, Information Systems Department, Tel Aviv, Israel
Venue:
Knowledge and Information Systems
Year:
2010

Citing 31
Cited 0

Algorithms for clustering data

Algorithms for clustering data
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
Bulk-insertions into R-trees using the small-tree-large-tree approach

Proceedings of the 6th ACM international symposium on Advances in geographic information systems
Data clustering: a review

ACM Computing Surveys (CSUR)
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Principles of data mining

Principles of data mining
Searching in metric spaces

ACM Computing Surveys (CSUR)
Merging R-Trees: Efficient Strategies for Local Bulk Insertion

Geoinformatica
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees

IEEE Transactions on Knowledge and Data Engineering
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Generic Approach to Bulk Loading Multidimensional Index Structures

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Evaluation of Generic Bulk Loading Techniques

Proceedings of the 27th International Conference on Very Large Data Bases
The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract)

WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
A Framework for Index Bulk Loading and Dynamization

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
GBI: A Generalized R-Tree Bulk-Insertion Strategy

SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Pivot selection techniques for proximity searching in metric spaces

Pattern Recognition Letters
Parallel bulk-loading of spatial data

Parallel Computing - Special issue: High performance computing with geographical data
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
Bulk Operations for Space-Partitioning Trees

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Fast and exact out-of-core and distributed k-means clustering

Knowledge and Information Systems
Extending metric index structures for efficient range query processing

Knowledge and Information Systems
Bulk insertion for R-trees by seeded clustering

Data & Knowledge Engineering
Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

Knowledge and Information Systems
CM-tree: A dynamic clustered index for similarity search in metric databases

Data & Knowledge Engineering
BoostMap: a method for efficient approximate similarity rankings

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Repositories of complex data types, such as images, audio, video and free text, are becoming increasingly frequent in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. An important class of access methods for similarity search in metric data is that of dynamic clustered metric trees, where the index is structured as a paged and balanced tree and the space is partitioned hierarchically into compact regions. While access methods of this class allow dynamic insertions typically of single objects, the problem of efficiently inserting a given data set into the index in bulk is largely open. In this article we address this problem and propose novel algorithms corresponding to its two cases, where the index is initially empty (i.e. bulk loading), and where the index is initially non empty (i.e. bulk insertion). The proposed bulk loading algorithm builds the index bottom-up layer by layer, using a new sampling based clustering method, which improves clustering results by improving the quality of the selected sample sets. The proposed bulk insertion algorithm employs the bulk loading algorithm to load the given data into a new index structure, and then merges the new and the existing structures into a unified high quality index, using a novel decomposition method to reduce overlaps between the structures. Both algorithms yield significantly improved construction and search performance, and are applicable to all dynamic clustered metric trees. Results from an extensive experimental study show that the proposed algorithms outperform alternative methods, reducing construction costs by up to 47% for CPU costs and 99% for I/O costs, and search costs by up to 48% for CPU costs and 30% for I/O costs.