Efficient discovery of risk patterns in medical data

Authors:
Jiuyong Li;Ada Wai-chee Fu;Paul Fahey
Affiliations:
School of Computer and Information Science, University of South Australia, Mawson Lakes, Adelaide 5095, South Australia, Australia;Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong;Department of Mathematics and Computing, University of Southern Queensland, Toowoomba 4350, Queensland, Australia
Venue:
Artificial Intelligence in Medicine
Year:
2009

Citing 20
Cited 8

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient search for association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Constraint-Based Rule Mining in Large, Dense Databases

Data Mining and Knowledge Discovery
Induction of Decision Trees

Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Non-Redundant Association Rules

Data Mining and Knowledge Discovery
Mining Frequent Itemsets without Support Threshold: With and without Item Constraints

IEEE Transactions on Knowledge and Data Engineering
K-Optimal Rule Discovery

Data Mining and Knowledge Discovery
Evaluation of rule interestingness measures with a clinical dataset on hepatitis

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Relative risk and odds ratio: a data mining perspective

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining risk patterns in medical data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Constraining and summarizing association rules in medical data

Knowledge and Information Systems
Comparing association rules and decision trees for disease prediction

HIKM '06 Proceedings of the international workshop on Healthcare information and knowledge management
High-utility pattern mining: A method for discovery of high-utility item sets

Pattern Recognition
Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble

IEEE Transactions on Information Technology in Biomedicine
Machine learning for medical diagnosis: history, state of the art and perspective

Artificial Intelligence in Medicine

Predictive rule discovery from electronic health records

Proceedings of the 1st ACM International Health Informatics Symposium
Adverse drug reaction mining in pharmacovigilance data using formal concept analysis

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Identification of school-aged children with high probability of risk behavior on the basis of easily measurable variables

USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health
A study of the single point mutation loci in the hepatitis b virus sequences via optimal risk and preventive sets with weights

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
An approach to the risk analysis of diabetes mellitus type 2 in a health care provider entity of Colombia using business intelligence

Proceedings of the 6th Euro American Conference on Telematics and Information Systems
Massive genomic data processing and deep analysis

Proceedings of the VLDB Endowment
Improving classifications for cardiac autonomic neuropathy using multi-level ensemble classifiers and feature selection based on random forest

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. Methods: To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. Results: The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. Conclusion: The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.