SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A new method for similarity indexing of market basket data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient and tumble similar set retrieval
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A Fast Algorithm to Cluster High Dimensional Basket Data
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient similarity search for market basket data
The VLDB Journal — The International Journal on Very Large Data Bases
CLOPE: a fast and effective clustering algorithm for transactional data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Center-Based Indexing for Nearest Neighbors Search
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Segmenting Customer Transactions Using a Pattern-Based Clustering Approach
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Localized signature table: fast similarity search on transaction data
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Hi-index | 0.00 |
In this paper, we propose a novel indexing method for similarity search in transaction databases where the frequency of database updates can be high. In our method, the incoming transactions are incrementally classified into clusters. The transactions in a cluster are represented using two features, namely the union and the intersection of all the transactions. Based on these two features, the transactions in a cluster are further divided into disjoint groups. As a result, all the transactions are organized as a two-level index structure. With this index, the insertion of a transaction can be quickly done because only a particular cluster needs to be modified. Moreover, when conducting a similarity search, we can compute for each level the lower and upper bounds on the distance between the query and each transaction in the cluster. Based on these bounds, the costs on the distance computation can be greatly reduced.