A new method for similarity indexing of market basket data

Authors:
Charu C. Aggarwal;Joel L. Wolf;Philip S. Yu
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Year:
1999

Citing 20
Cited 22

Description and performance analysis of signature file methods for office filing

ACM Transactions on Information Systems (TOIS)
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Signature files

Information retrieval
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Nearest neighbor searching and applications

Nearest neighbor searching and applications
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Fast Nearest Neighbor Search in High-Dimensional Space

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The S-Tree: An Efficient Index for Multidimensional Objects

SSD '97 Proceedings of the 5th International Symposium on Advances in Spatial Databases

The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Effective personalization based on association rule discovery from web usage data

Proceedings of the 3rd international workshop on Web information and data management
OLAP-Based Data Mining for Business Intelligence Applications in Telecommunications and E-commerce

DNIS '00 Proceedings of the International Workshop on Databases in Networked Information Systems
Data Mining and Personalization Technologies

DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Soft Computing in E-Commerce

AFSS '02 Proceedings of the 2002 AFSS International Conference on Fuzzy Systems. Calcutta: Advances in Soft Computing
Efficient similarity search for market basket data

The VLDB Journal — The International Journal on Very Large Data Bases
Localized signature table: fast similarity search on transaction data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Similarity evaluation on tree-structured data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Multimedia Correlation Analysis in Unstructured Peer-to-Peer Networks

WOWMOM '06 Proceedings of the 2006 International Symposium on on World of Wireless, Mobile and Multimedia Networks
Algorithms for storytelling

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations

Computational Linguistics
Relevant estimation among fields using field association words

International Journal of Computer Applications in Technology
A new algorithm for performing ratings-based collaborative filtering

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Data mining for web personalization

The adaptive web
On indexing error-tolerant set containment

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hierarchical semantic-based index for ad hoc image retrieval

Journal of Mobile Multimedia
Similarity search in transaction databases with a two-level bounding mechanism

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A strategy-oriented operation module for recommender systems in E-commerce

Computers and Operations Research
Similarity search in sensor networks using semantic-based caching

Journal of Network and Computer Applications
Efficient processing of probabilistic set-containment queries on uncertain set-valued data

Information Sciences: an International Journal
Efficient bitmap-based indexing of time-based interval sequences

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, many data mining methods have been proposed for finding useful and structured information from market basket data. The association rule model was recently proposed in order to discover useful patterns and dependencies in such data. This paper discusses a method for indexing market basket data efficiently for similarity search. The technique is likely to be very useful in applications which utilize the similarity in customer buying behavior in order to make peer recommendations. We propose an index called the signature table, which is very flexible in supporting a wide range of similarity functions. The construction of the index structure is independent of the similarity function, which can be specified at query time. The resulting similarity search algorithm shows excellent scalability with increasing memory availability and database size.