A retrieval technique for similar shapes
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Efficient processing of spatial joins using R-trees
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient and effective querying by image content
Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Spatial joins using seeded trees
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
Data Mining and Knowledge Discovery
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Parallel Processing of Spatial Joins Using R-trees
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A Cost Model and Index Architecture for the Similarity Join
Proceedings of the 17th International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Generic Approach to Bulk Loading Multidimensional Index Structures
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Discovery of Spatial Association Rules in Geographic Information Databases
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Optimal Dimension Order: A Generic Technique for the Similarity Join
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Partition-Based Similarity Join in High Dimensional Data Spaces
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
On producing join results early
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Integrating similarity-based queries in image DBMSs
Proceedings of the 2004 ACM symposium on Applied computing
IEEE Transactions on Knowledge and Data Engineering
An approximate algorithm for top-k closest pairs join query in large high dimensional data
Data & Knowledge Engineering
Fast similarity join for multi-dimensional data
Information Systems
Efficient index-based KNN join processing for high-dimensional data
Information and Software Technology
An empirical study on selective partitioning dimensions for partition-based similarity joins
Data & Knowledge Engineering
Adaptable similarity search using non-relevant information
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Progressive merge join: a generic and non-blocking sort-based join algorithm
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Gorder: an efficient method for KNN join processing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
ACM Transactions on Database Systems (TODS)
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
Solving similarity joins and range queries in metric spaces with the list of twin clusters
Journal of Discrete Algorithms
Distance-join: pattern match query in a large graph database
Proceedings of the VLDB Endowment
SimDB: a similarity-aware database system
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An efficient similarity join algorithm with cosine similarity predicate
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Answering pattern match queries in large graph databases via graph embedding
The VLDB Journal — The International Journal on Very Large Data Bases
Partition-Based similarity joins using diagonal dimensions in high dimensional data spaces
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
MapReduce-based similarity join for metric spaces
Proceedings of the 1st International Workshop on Cloud Intelligence
Progressive high-dimensional similarity join
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Spatio-textual similarity joins
Proceedings of the VLDB Endowment
Super-EGO: fast multi-dimensional similarity join
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The similarity join combines two point sets of a multidimensional vector space such that the result contains all point pairs where the distance does not exceed a parameter &egr;. In this paper, we propose the Epsilon Grid Order, a new algorithm for determining the similarity join of very large data sets. Our solution is based on a particular sort order of the data points, which is obtained by laying an equi-distant grid with cell length &egr; over the data space and comparing the grid cells lexicographically. A typical problem of grid-based approaches such as MSJ or the &egr;-kdB-tree is that large portions of the data sets must be held simultaneously in main memory. Therefore, these approaches do not scale to large data sets. Our technique avoids this problem by an external sorting algorithm and a particular scheduling strategy during the join phase. In the experimental evaluation, a substantial improvement over competitive techniques is shown.