Distributed Processing of Similarity Queries

Authors:
Apostolos N. Papadopoulos;Yannis Manolopoulos
Affiliations:
Data Engineering Research Lab., Department of Informatics, Aristotle University, 54006 Thessaloniki, Greece. apostol@delab.csd.auth.gr;Data Engineering Research Lab., Department of Informatics, Aristotle University, 54006 Thessaloniki, Greece. yannis@delab.csd.auth.gr
Venue:
Distributed and Parallel Databases
Year:
2001

Citing 24
Cited 6

Principles of distributed database systems

Principles of distributed database systems
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Linear clustering of objects with multiple attributes

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
UNIX network programming

UNIX network programming
A retrieval technique for similar shapes

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems

Communications of the ACM
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Accounting for boundary effects in nearest neighbor searching

Proceedings of the eleventh annual symposium on Computational geometry
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Parallel processing of nearest neighbor queries in declustered spatial data

GIS '96 Proceedings of the 4th ACM international workshop on Advances in geographic information systems
Similarity query processing using disk arrays

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Optimal Expected-Time Algorithms for Closest Point Problems

ACM Transactions on Mathematical Software (TOMS)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
An introduction to spatial database systems

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Declustering Spatial Databases on a Multi-Computer Architecture

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Performance of Nearest Neighbor Queries in R-Trees

ICDT '97 Proceedings of the 6th International Conference on Database Theory
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Spatial Join Strategies in Distributed Spatial DBMS

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases

On efficiently processing nearest neighbor queries in a loosely coupled set of data sources

Proceedings of the 12th annual ACM international workshop on Geographic information systems
LeeWave: level-wise distribution of wavelet coefficients for processing kNN queries over distributed streams

Proceedings of the VLDB Endowment
Parallel query processing on distributed clustering indexes

Journal of Discrete Algorithms
Sync/Async parallel search for the efficient design and construction of web search engines

Parallel Computing
Load Balancing Query Processing in Metric-Space Similarity Search

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Modelling efficient novelty-based search result diversification in metric spaces

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many modern applications in diverse fields demand the efficient manipulation of very large multidimensional datasets. It is evident, that efficient and effective query processing techniques need to be developed, in order to provide acceptable response times in query processing. In this paper, we study the processing of similarity nearest neighbor queries in large distributed multidimensional databases, where objects are represented as vectors in a vector space, and are distributed in a multi-computer environment. The departure from the centralized case embodies a number of advantages and (unfortunately) a number of difficulties that need to be successfully overcome. In this perspective, four query evaluation strategies are presented, namely Concurrent Processing (CP), Selective Processing (SP), Two-Phase Processing (2PP) and Probabilistic Processing (PRP). The proposed techniques are compared analytically and experimentally, in order to discover the advantages of each one, as well as the best cases where each one should be applied. Experimental results are presented, demonstrating the performance of each method under different parameters values. Also, we investigate the impact of derived data that should be maintained in order to process similarity queries efficiently.