Association-based similarity testing and its applications

Authors:
Tao Li;Mitsunori Ogihara;Shenghuo Zhu
Affiliations:
(Correspd. Tel.: +1 585 275 8479/ Fax: +1 585 273 4556/ E-mail: taoli@cs.rochester.edu) Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA;Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA;Computer Science Department, University of Rochester, Rochester, NY 14627-0226, USA
Venue:
Intelligent Data Analysis
Year:
2003

Citing 36
Cited 2

Fibonacci heaps and their uses in improved network optimization algorithms

Journal of the ACM (JACM)
Algorithms for clustering data

Algorithms for clustering data
Faster scaling algorithms for network problems

SIAM Journal on Computing
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Similarity-based queries

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A framework for measuring changes in data characteristics

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Learning in graphical models

Learning in graphical models
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Clustering Algorithms

Clustering Algorithms
Advances in Distributed and Parallel Knowledge Discovery

Advances in Distributed and Parallel Knowledge Discovery
A Decomposition Theorem for Maximum Weight Bipartite Matchings

SIAM Journal on Computing
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
Finding Interesting Associations without Support Pruning

IEEE Transactions on Knowledge and Data Engineering
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Exploiting Dataset Similarity for Distributed Mining

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Finding Similar Time Series

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Proximity Search in Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Similarity of event sequences

TIME '97 Proceedings of the 4th International Workshop on Temporal Representation and Reasoning (TIME '97)
Evaluation of Sampling for Data Mining of Association Rules

Evaluation of Sampling for Data Mining of Association Rules
A new distributed data mining model based on similarity

Proceedings of the 2003 ACM symposium on Applied computing

Comparing Datasets Using Frequent Itemsets: Dependency on the Mining Parameters

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
CLAP: Collaborative pattern mining for distributed information systems

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new similarity measure between basket datasets based on associations. The new measure is calculated from support counts using a formula inspired by information entropy. Experiments on both real and synthetic datasets show the effectiveness of the measure. This paper then investigates the applications of the similarity measure. It first studies the problem of finding a mapping between categorical database attribute sets using similarity measures. A generic approach for identifying such a mapping is proposed. The approach is implemented based on the similarity measure proposed in the paper and its performance has been evaluated and validated. Moreover, this paper also explores the applications of using the similarity measure to mine distributed datasets.