Compass: A hybrid method for clinical and biobank data mining

Authors:
K. Krysiak-Baltyn;T. Nordahl Petersen;K. Audouze;Niels Jørgensen;L. íngquist;S. Brunak
Affiliations:
Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark;Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark;Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark;University Department of Growth and Reproduction, Rigshospitalet, Copenhagen, Denmark;Institute of Preventative Medicine, Bispebjerg and Frederiksberg Hospitals - The Capital Region, Frederiksberg, Denmark;Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark and NNF Center for Protein Research, Health Sciences Faculty, University of Copenhagen, Denmark
Venue:
Journal of Biomedical Informatics
Year:
2014

Citing 11
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining optimized association rules for numeric attributes

Journal of Computer and System Sciences
Data Mining with optimized two-dimensional association rules

ACM Transactions on Database Systems (TODS)
Self-Organizing Maps

Self-Organizing Maps
Mining Optimized Gain Rules for Numeric Attributes

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering Significant Patterns

Machine Learning
Assessing data mining results via swap randomization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Least squares quantization in PCM

IEEE Transactions on Information Theory
Mining association rules with improved semantics in medical databases

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as ''hotspots'' for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical-disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.