Efficient top-k similarity join processing over multi-valued objects

Authors:
Wenjie Zhang;Liming Zhan;Ying Zhang;Muhammad Aamir Cheema;Xuemin Lin
Affiliations:
School of Computer Science & Engineering, University of New South Wales, Sydney, Australia;School of Computer Science & Engineering, University of New South Wales, Sydney, Australia;School of Computer Science & Engineering, University of New South Wales, Sydney, Australia;School of Computer Science & Engineering, University of New South Wales, Sydney, Australia;School of Computer Science & Engineering, University of New South Wales, Sydney, Australia
Venue:
World Wide Web
Year:
2014

Citing 25
Cited 0

Efficient processing of spatial joins using R-trees

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Closest pair queries in spatial databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Spatial databases with application to GIS

Spatial databases with application to GIS
Multiway spatial joins

ACM Transactions on Database Systems (TODS)
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient OLAP Operations in Spatial Data Warehouses

SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
Transform-Space View: Performing Spatial Join in the Transform Space Using Original-Space Indexes

IEEE Transactions on Knowledge and Data Engineering
Distance join queries on spatial networks

GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Efficient join processing over uncertain data

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Cost-Based Predictive Spatiotemporal Join

IEEE Transactions on Knowledge and Data Engineering
Detecting Overlapping Community Structures in Networks

World Wide Web
Continuous Intersection Joins Over Moving Objects

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Top-k Spatial Joins of Probabilistic Objects

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
K-nearest neighbor search for fuzzy objects

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fundamentals of Database Systems

Fundamentals of Database Systems
Stochastic skyline operator

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
A unified approach for computing top-k pairs in multidimensional space

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Probabilistic similarity join on uncertain data

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Efficient quantile retrieval on multi-dimensional data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Approximate minimization algorithms for the 0/1 Knapsack and Subset-Sum Problem

Operations Research Letters
Efficiently Monitoring Top-k Pairs over Sliding Windows

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Creation and growth of online social network

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The top-k similarity joins have been extensively studied and used in a wide spectrum of applications such as information retrieval, decision making, spatial data analysis and data mining. Given two sets of objects $\mathcal U$ and $\mathcal V$, a top-k similarity join returns k pairs of most similar objects from $\mathcal U \times \mathcal V$. In the conventional model of top-k similarity join processing, an object is usually regarded as a point in a multi-dimensional space and the similarity is measured by some simple distance metrics like Euclidean distance. However, in many applications an object may be described by multiple values (instances) and the conventional model is not applicable since it does not address the distributions of object instances. In this paper, we study top-k similarity join over multi-valued objects. We apply two types of quantile based distance measures, 驴-quantile distance and 驴-quantile group-base distance, to explore the relative instance distribution among the multiple instances of objects. Efficient and effective techniques to process top-k similarity joins over multi-valued objects are developed following a filtering-refinement framework. Novel distance, statistic and weight based pruning techniques are proposed. Comprehensive experiments on both real and synthetic datasets demonstrate the efficiency and effectiveness of our techniques.