Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases

Authors:
Affiliations:
Venue:
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Year:
2000

Citing 0
Cited 13

Epsilon grid order: an algorithm for the similarity join on massive high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Multiple Similarity Queries: A Basic DBMS Operation for Mining in Metric Databases

IEEE Transactions on Knowledge and Data Engineering
Discovery of Influence Sets in Frequently Updated Databases

Proceedings of the 27th International Conference on Very Large Data Bases
The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
High dimensional reverse nearest neighbor queries

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Incremental and effective data summarization for dynamic hierarchical clustering

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Multi-step density-based clustering

Knowledge and Information Systems
Efficient processing of complex similarity queries in RDBMS through query rewriting

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Towards a novel approach to multimedia data mixed fragmentation

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Processing distance-based queries in multidimensional data spaces using R-trees

PCI'01 Proceedings of the 8th Panhellenic conference on Informatics
A comparative analysis of similarity measurement techniques through SimReq framework

Proceedings of the 7th International Conference on Frontiers of Information Technology
Efficient processing of multiple DTW queries in time series databases

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Towards multimedia fragmentation

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest neighbor queries are the most important queries. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically explored by iteratively asking similarity queries for answers of previous similarity queries.In this paper, we introduce a generic scheme for such data mining algorithms and we investigate two orthogonal approaches, reducing I/O cost as well as CPU cost, to speed-up the processing of multiple similarity queries. The proposed techniques apply to any type of similarity query and to an implementation based on an index or using a sequential scan. Parallelization yields an additional impressive speed-up. An extensive performance evaluation confirms the efficiency of our approach.