Similarity query processing using disk arrays

Authors:
Apostolos N. Papadopoulos;Yannis Manolopoulos
Affiliations:
Department of Informatics, Aristotle University, Thessaloniki 54006, Greece;Department of Informatics, Aristotle University, Thessaloniki 54006, Greece
Venue:
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Year:
1998

Citing 26
Cited 10

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Multi-disk B-trees

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Parallel R-trees

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Probability distributions for seek time evaluation

Information Sciences: an International Journal
Towards an analysis of range query performance in spatial data structures

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The SEQUOIA 2000 storage benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An introduction to disk drive modeling

Computer
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A model for the prediction of R-tree performance

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Disk Allocation Methods for Parallelizing Grid Files

Proceedings of the Tenth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Performance of Nearest Neighbor Queries in R-Trees

ICDT '97 Proceedings of the 6th International Conference on Database Theory
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Integration of spatial join algorithms for processing multiple inputs

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A cost model for query processing in high dimensional data spaces

ACM Transactions on Database Systems (TODS)
Distributed Processing of Similarity Queries

Distributed and Parallel Databases
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files

ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
Algorithms for Joining R-Trees and Linear Region Quadtrees

SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Parallel bulk-loading of spatial data

Parallel Computing - Special issue: High performance computing with geographical data
On efficiently processing nearest neighbor queries in a loosely coupled set of data sources

Proceedings of the 12th annual ACM international workshop on Geographic information systems
Efficient parallel processing for K-nearest-neighbor search in spatial databases

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Efficient and robust large medical image retrieval in mobile cloud computing environment

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity queries are fundamental operations that are used extensively in many modern applications, whereas disk arrays are powerful storage media of increasing importance. The basic trade-off in similarity query processing in such a system is that increased parallelism leads to higher resource consumptions and low throughput, whereas low parallelism leads to higher response times. Here, we propose a technique which is based on a careful investigation of the currently available data in order to exploit parallelism up to a point, retaining low response times during query processing. The underlying access method is a variation of the R*-tree, which is distributed among the components of a disk array, whereas the system is simulated using event-driven simulation. The performance results conducted, demonstrate that the proposed approach outperforms by factors a previous branch-and-bound algorithm and a greedy algorithm which maximizes parallelism as much as possible. Moreover, the comparison of the proposed algorithm to a hypothetical (non-existing) optimal one (with respect to the number of disk accesses) shows that the former is on average two times slower than the latter.