Relative risk and odds ratio: a data mining perspective

Authors:
Haiquan Li;Jinyan Li;Limsoon Wong;Mengling Feng;Yap-Peng Tan
Affiliations:
Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Institute for Infocomm Research, Singapore;Nanyang Technological University, Singapore;Nanyang Technological University, Singapore
Venue:
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2005

Citing 13
Cited 12

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient mining of association rules using closed itemset lattices

Information Systems
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Trie memory

Communications of the ACM
Mining frequent patterns with counting inference

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The Closed Keys Base of Frequent Itemsets

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Selecting the right objective measure for association analysis

Information Systems - Knowledge discovery and data mining (KDD 2002)

Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining statistically important equivalence classes and delta-discriminative emerging patterns

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Negative Generator Border for Effective Pattern Maintenance

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Efficient discovery of risk patterns in medical data

Artificial Intelligence in Medicine
About the lossless reduction of the minimal generator family of a context

ICFCA'07 Proceedings of the 5th international conference on Formal concept analysis
Evolution and maintenance of frequent pattern space when transactions are removed

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Succinct system of minimal generators: a thorough study, limitations and new definitions

CLA'06 Proceedings of the 4th international conference on Concept lattices and their applications
Adverse drug reaction mining in pharmacovigilance data using formal concept analysis

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Mining monolingual and bilingual corpora

Intelligent Data Analysis
Efficiently finding the best parameter for the emerging pattern-based classifier PCL

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Clustering and understanding documents via discrimination information maximization

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.