Localized signature table: fast similarity search on transaction data

Authors:
Qiang Jing;Rui Yang;Panos Kalnis;Anthony K. H. Tung
Affiliations:
National University of Singapore, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore;National University of Singapore, Singapore
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 12
Cited 1

Algorithms for clustering data

Algorithms for clustering data
R-trees: a dynamic index structure for spatial searching

Readings in database systems
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A new method for similarity indexing of market basket data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient and tumble similar set retrieval

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A Fast Algorithm to Cluster High Dimensional Basket Data

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
On B-Tree Indices for Skewed Distributions

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient similarity search for market basket data

The VLDB Journal — The International Journal on Very Large Data Bases

Similarity search in transaction databases with a two-level bounding mechanism

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, techniques for supporting efficient similarity search over huge transaction datasets have emerged as an important research area. Several indexing schemes have been proposed towards this direction. Typically, these schemes provide a tradeoff between searching efficiency and indexing overhead in terms of space. In this paper, we propose a novel indexing scheme for similarity search on transaction data. Based on well-studied clustering techniques, we develop a construction algorithm for the proposed index and a branch-and-bound searching strategy for answering similarity search. Unlike previous techniques, our indexing scheme exhibits high search efficiency and low space requirements by trading-off the pre-computation time. This behavior is ideal for applications with low update but high read volume e.g., data warehousing, collaborative filtering, etc.). Moreover, our experimental results illustrate that our method is robust to the varying characteristics of the datasets.