Similarity search in transaction databases with a two-level bounding mechanism

Authors:
Jo-Chun Chuang;Chung-Wen Cho;Arbee L. P. Chen
Affiliations:
Department of Computer Science, National Tsing Hua University, Taiwan, R.O.C;Department of Computer Science, National Tsing Hua University, Taiwan, R.O.C;Department of Computer Science, National Chengchi University, Taiwan, R.O.C
Venue:
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Year:
2006

Citing 10
Cited 0

Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A new method for similarity indexing of market basket data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient and tumble similar set retrieval

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A Fast Algorithm to Cluster High Dimensional Basket Data

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient similarity search for market basket data

The VLDB Journal — The International Journal on Very Large Data Bases
CLOPE: a fast and effective clustering algorithm for transactional data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Center-Based Indexing for Nearest Neighbors Search

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Segmenting Customer Transactions Using a Pattern-Based Clustering Approach

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Localized signature table: fast similarity search on transaction data

Proceedings of the thirteenth ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel indexing method for similarity search in transaction databases where the frequency of database updates can be high. In our method, the incoming transactions are incrementally classified into clusters. The transactions in a cluster are represented using two features, namely the union and the intersection of all the transactions. Based on these two features, the transactions in a cluster are further divided into disjoint groups. As a result, all the transactions are organized as a two-level index structure. With this index, the insertion of a transaction can be quickly done because only a particular cluster needs to be modified. Moreover, when conducting a similarity search, we can compute for each level the lower and upper bounds on the distance between the query and each transaction in the cluster. Based on these bounds, the costs on the distance computation can be greatly reduced.