Cohesion: A concept and framework for confident association discovery with potential application in microarray mining

Authors:
Ramkishore Bhattacharyya
Affiliations:
Microsoft India (R&D) Pvt. Ltd., Gachibowli, Hyderabad 500 032, India
Venue:
Applied Soft Computing
Year:
2011

Citing 28
Cited 6

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining association rules with multiple minimum supports

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data Mining Techniques: For Marketing, Sales, and Customer Support

Data Mining Techniques: For Marketing, Sales, and Customer Support
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
LPMiner: An Algorithm for Finding Frequent Itemsets Using Length-Decreasing Support Constraint

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficient Mining of High Confidience Association Rules without Support Thresholds

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Mining Frequent Itemsets Using Support Constraints

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Confidence-Lift Support Specification for Interesting Associations Mining

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Finding Interesting Associations without Support Pruning

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Mining association rules on significant rare data using relative support

Journal of Systems and Software
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
FARMER: finding interesting rule groups in microarray datasets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining Frequent Closed Patterns in Microarray Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Mining top-K covering rule groups for gene expression data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Using Information-Theoretic Measures to Assess Association Rule Interestingness

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Association against dissociation: some pragmatic considerations for frequent itemset generation under fixed and variable thresholds

ACM SIGKDD Explorations Newsletter
On discovery of maximal confident rules without support pruning in microarray data

Proceedings of the 5th international workshop on Bioinformatics
Accurate Cancer Classification Using Expressions of Very Few Genes

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Bioinformatics with soft computing

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Isolating top-k dense regions with filtration of sparse background

Pattern Recognition Letters
An improved association rules mining method

Expert Systems with Applications: An International Journal
eXploratory K-Means: A new simple and efficient algorithm for gene clustering

Applied Soft Computing
Evolutionary Generalized Radial Basis Function neural networks for improving prediction accuracy in gene classification using feature selection

Applied Soft Computing
Towards applying associative classifier for genetic variants

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
A case study of muscle dysmorphia disorder diagnostics

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The minimal frequency constraint in classical association mining algorithms turns out to be a challenging bottleneck in discovery of large number of infrequent associations that can be potential in knowledge content. A lower choice for threshold frequency not only incurs huge cost of pattern explosion but also cuts reliability of discovered knowledge. The goal of the present paper is to devise a new framework addressing two necessities. The first is discovery of confident associations unconstrained to classical minimal frequency. The second is to ensure quality of the discovered rules. We propose a new property among items, terming it cohesion, and develop cohesion-based scalable algorithms for confident association discovery. In order to assess quality of rules in terms of knowledge content, we propose two new measures, accuracy and predictability based on documented associations. Experiments with market-basket data as well as microarray data establish superiority of cohesion-based technique both in terms of amount and quality of discovered knowledge.