Fast similarity join for multi-dimensional data

Authors:
Dmitri V. Kalashnikov;Sunil Prabhakar
Affiliations:
Department of Computer Science, University of California, Irvine, 4300 Calit2 Building, Irvine, CA 92697, USA;Department of Computer Sciences, Purdue University, 250 N. University Street, West Lafayette, IN 47907, USA
Venue:
Information Systems
Year:
2007

Citing 23
Cited 7

Efficient processing of spatial joins using R-trees

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Spatial hash-joins

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Making B+- trees cache conscious in main memory

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimizing multidimensional index trees for main memory access

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects

IEEE Transactions on Computers
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
High-Dimensional Similarity Joins

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
High Dimensional Similarity Joins: Algorithms and Performance Evaluation

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
A Cost Model and Index Architecture for the Similarity Join

Proceedings of the 17th International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Discovery of Spatial Association Rules in Geographic Information Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Efficient Evaluation of Continuous Range Queries on Moving Objects

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Similarity Join for Low-and High-Dimensional Data

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Supporting Content-based Queries over Images in MARS

ICMCS '97 Proceedings of the 1997 International Conference on Multimedia Computing and Systems
Using sets of feature vectors for similarity search on voxelized CAD objects

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Main Memory Evaluation of Monitoring Queries Over Moving Objects

Distributed and Parallel Databases

Domain-independent data cleaning via analysis of entity-relationship graph

ACM Transactions on Database Systems (TODS)
A Normalization Framework for Multimedia Databases

IEEE Transactions on Knowledge and Data Engineering
Hermes: a travel through semantics on the data web

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hermes: Data Web search on a pay-as-you-go integration infrastructure

Web Semantics: Science, Services and Agents on the World Wide Web
Indexing high-dimensional data for main-memory similarity search

Information Systems
Progressive high-dimensional similarity join

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Super-EGO: fast multi-dimensional similarity join

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. In this paper, we develop two new in-memory spatial join algorithms, the Grid-join and EGO*-join, and study their performance. Through evaluation, we explore the domain of applicability of each approach and provide recommendations for the choice of a join algorithm depending upon the dimensionality of the data as well as the expected selectivity of the join. We show that the two new proposed join techniques substantially outperform the state-of-the-art join algorithm, the EGO-join.