Similarity queries: their conceptual evaluation, transformations, and processing

Authors:
Yasin N. Silva;Walid G. Aref;Per-Ake Larson;Spencer S. Pearson;Mohamed H. Ali
Affiliations:
Arizona State University, Phoenix, USA;Purdue University, West Lafayette, USA;Microsoft Research, Redmond, USA;Arizona State University, Phoenix, USA;Microsoft Corporation, Redmond, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2013

Citing 26
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A multi-similarity algebra

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
Including Group-By in Query Optimization

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Eager Aggregation and Lazy Aggregation

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The Volcano Optimizer Generator: Extensibility and Efficient Search

Proceedings of the Ninth International Conference on Data Engineering
Efficient similarity-based operations for data integration

Data & Knowledge Engineering
The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Knowledge and Information Systems
A Primitive Operator for Similarity Joins in Data Cleaning

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
SIREN: a similarity retrieval engine for complex data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient processing of complex similarity queries in RDBMS through query rewriting

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Supporting ranking and clustering as generalized order-by and group-by

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MAMCost: Global and Local Estimates leading to Robust Cost Estimation of Similarity Queries

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Cluster By: a new sql extension for spatial data aggregation

Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
An efficient framework for similarity query optimization

Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
Metric space similarity joins

ACM Transactions on Database Systems (TODS)
Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Cost-based variable-length-gram selection for string collections to support approximate queries efficiently

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Exploiting similarity-aware grouping in decision support systems

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Fast Indexes and Algorithms for Set Similarity Selection Queries

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Similarity Group-By

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
SimDB: a similarity-aware database system

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Similarity join size estimation using locality sensitive hashing

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many application scenarios can significantly benefit from the identification and processing of similarities in the data. Even though some work has been done to extend the semantics of some operators, for example join and selection, to be aware of data similarities, there has not been much study on the role and implementation of similarity-aware operations as first-class database operators. Furthermore, very little work has addressed the problem of evaluating and optimizing queries that combine several similarity operations. The focus of this paper is the study of similarity queries that contain one or multiple first-class similarity database operators such as Similarity Selection, Similarity Join, and Similarity Group-by. Particularly, we analyze the implementation techniques of several similarity operators, introduce a consistent and comprehensive conceptual evaluation model for similarity queries, and present a rich set of transformation rules to extend cost-based query optimization to the case of similarity queries.