Algorithms for clustering data
Algorithms for clustering data
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
CIKM '93 Proceedings of the second international conference on Information and knowledge management
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Multidimensional access methods
ACM Computing Surveys (CSUR)
Bulk-insertions into R-trees using the small-tree-large-tree approach
Proceedings of the 6th ACM international symposium on Advances in geographic information systems
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Principles of data mining
ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees
IEEE Transactions on Knowledge and Data Engineering
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
STR: A Simple and Efficient Algorithm for R-Tree Packing
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Generic Approach to Bulk Loading Multidimensional Index Structures
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Evaluation of Generic Bulk Loading Techniques
Proceedings of the 27th International Conference on Very Large Data Bases
The Buffer Tree: A New Technique for Optimal I/O-Algorithms (Extended Abstract)
WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
A Framework for Index Bulk Loading and Dynamization
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
GBI: A Generalized R-Tree Bulk-Insertion Strategy
SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Pivot selection techniques for proximity searching in metric spaces
Pattern Recognition Letters
Parallel bulk-loading of spatial data
Parallel Computing - Special issue: High performance computing with geographical data
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Bulk Operations for Space-Partitioning Trees
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Fast and exact out-of-core and distributed k-means clustering
Knowledge and Information Systems
Extending metric index structures for efficient range query processing
Knowledge and Information Systems
Bulk insertion for R-trees by seeded clustering
Data & Knowledge Engineering
Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
Knowledge and Information Systems
CM-tree: A dynamic clustered index for similarity search in metric databases
Data & Knowledge Engineering
BoostMap: a method for efficient approximate similarity rankings
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Hi-index | 0.00 |
Repositories of complex data types, such as images, audio, video and free text, are becoming increasingly frequent in various fields. A general searching approach for such data types is that of similarity search, where the search is for similar objects and similarity is modeled by a metric distance function. An important class of access methods for similarity search in metric data is that of dynamic clustered metric trees, where the index is structured as a paged and balanced tree and the space is partitioned hierarchically into compact regions. While access methods of this class allow dynamic insertions typically of single objects, the problem of efficiently inserting a given data set into the index in bulk is largely open. In this article we address this problem and propose novel algorithms corresponding to its two cases, where the index is initially empty (i.e. bulk loading), and where the index is initially non empty (i.e. bulk insertion). The proposed bulk loading algorithm builds the index bottom-up layer by layer, using a new sampling based clustering method, which improves clustering results by improving the quality of the selected sample sets. The proposed bulk insertion algorithm employs the bulk loading algorithm to load the given data into a new index structure, and then merges the new and the existing structures into a unified high quality index, using a novel decomposition method to reduce overlaps between the structures. Both algorithms yield significantly improved construction and search performance, and are applicable to all dynamic clustered metric trees. Results from an extensive experimental study show that the proposed algorithms outperform alternative methods, reducing construction costs by up to 47% for CPU costs and 99% for I/O costs, and search costs by up to 48% for CPU costs and 30% for I/O costs.