Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution

Authors:
Hui Xiong;Pang-Ning Tan;Vipin Kumar
Affiliations:
-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 11
Cited 41

Algorithms for clustering data

Algorithms for clustering data
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining confident rules without support requirement

Proceedings of the tenth international conference on Information and knowledge management
Information Retrieval

Information Retrieval
Finding Interesting Associations without Support Pruning

IEEE Transactions on Knowledge and Data Engineering
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Mining for patterns in contradictory data

Proceedings of the 2004 international workshop on Information quality in information systems
Support envelopes: a technique for exploring the structure of association patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Generalizing the notion of support

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy leakage in multi-relational databases via pattern based semi-supervised learning

Proceedings of the 14th ACM international conference on Information and knowledge management
Enhancing Data Analysis with Noise Removal

IEEE Transactions on Knowledge and Data Engineering
Mining quantitative correlated patterns using an information-theoretic approach

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy leakage in multi-relational databases: a semi-supervised learning perspective

The VLDB Journal — The International Journal on Very Large Data Bases
Adapting association patterns for text categorization: weaknesses and enhancements

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Mining maximal hyperclique pattern: A hybrid search strategy

Information Sciences: an International Journal
Web Service Discovery via Semantic Association Ranking and Hyperclique Pattern Discovery

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Efficient mining of weighted interesting patterns with a strong weight and/or support affinity

Information Sciences: an International Journal
On the strength of hyperclique patterns for text categorization

Information Sciences: an International Journal
Efficient association rule mining among both frequent and infrequent items

Computers & Mathematics with Applications
Discovery of maximum length frequent itemsets

Information Sciences: an International Journal
Association rule and quantitative association rule mining among infrequent items

Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007)
New probabilistic interest measures for association rules

Intelligent Data Analysis
Correlated pattern mining in quantitative databases

ACM Transactions on Database Systems (TODS)
Relative Linkage Disequilibrium: A New Measure for Association Rules

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Selecting the Right Features for Bipartite-Based Text Clustering

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
WHFPMiner: Efficient Mining of Weighted Highly-Correlated Frequent Patterns Based on Weighted FP-Tree Approach

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Mining Mutually Dependent Ordered Subtrees in Tree Databases

New Frontiers in Applied Data Mining
On Optimal Rule Mining: A Framework and a Necessary and Sufficient Condition of Antimonotonicity

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Towards understanding hierarchical clustering: A data distribution perspective

Neurocomputing
On pushing weight constraints deeply into frequent itemset mining

Intelligent Data Analysis
Dynamic Mining of Quantitative and Categorical Attributes with Skewed Support Distribution

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Mining globally distributed frequent subgraphs in a single labeled graph

Data & Knowledge Engineering
Semantic feature selection for object discovery in high-resolution remote sensing imagery

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hyperclique pattern based off-topic detection

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Term weighting evaluation in bipartite partitioning for text clustering

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Mining correlated subgraphs in graph databases

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Validation of overlapping clustering: A random clustering perspective

Information Sciences: an International Journal
Two measures of objective novelty in association rule mining

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Mining classification rules without support: an anti-monotone property of Jaccard measure

DS'11 Proceedings of the 14th international conference on Discovery science
Mining quantitative maximal hyperclique patterns: a summary of results

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Improving data quality by source analysis

Journal of Data and Information Quality (JDIQ)
On the computation of maximal-correlated cuboids cells

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Event correlation for operations management of largescale IT systems

Proceedings of the 9th international conference on Autonomic computing
Weighted association rule mining via a graph based connectivity model

Information Sciences: an International Journal
Optimonotone Measures For Optimal Rule Discovery

Computational Intelligence
Mining frequent correlated graphs with a new measure

Expert Systems with Applications: An International Journal
Efficient mining of maximal correlated weight frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing association-rule mining algorithms often relyon the support-based pruning strategy to prune its combinatorialsearch space. This strategy is not quite effectivefor data sets with skewed support distributions because theytend to generate many spurious patterns involving itemsfrom different support levels or miss potentially interestinglow-support patterns. To overcome these problems, we proposethe concept of hyperclique pattern, which uses an objectivemeasure called h-confidence to identify strong affinitypatterns. We also introduce the novel concept of cross-supportproperty for eliminating patterns involving itemswith substantially different support levels. Our experimentalresults demonstrate the effectiveness of this method forfinding patterns in dense data sets even at very low supportthresholds, where most of the existing algorithms wouldbreak down. Finally, hyperclique patterns also show greatpromise for clustering items in high dimensional space.