A modified apriori algorithm for analysing high-dimensional gene data

Authors:
Claudia Pommerenke;Benedikt Friedrich;Thorsten Johl;Lothar Jänsch;Susanne Häussler;Frank Klawonn
Affiliations:
Infection Genetics, Helmholtz Centre for Infection Research, Brunswick, Germany;Computer Science, Ostfalia University of Applied Sciences, Wolfenbüttel, and Bioinformatics and Statistics Group, Helmholtz Centre for Infection Research, Brunswick, Germany;Cellular Proteomics, Helmholtz Centre for Infection Research, Brunswick, Germany;Cellular Proteomics, Helmholtz Centre for Infection Research, Brunswick, Germany;Cellular Proteomics, Helmholtz Centre for Infection Research, Brunswick, Germany;Computer Science, Ostfalia University of Applied Sciences, Wolfenbüttel, and Bioinformatics and Statistics Group, Helmholtz Centre for Infection Research, Brunswick, Germany
Venue:
IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Year:
2011

Citing 7
Cited 0

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the geometry of similarity search: dimensionality curse and concentration of measure

Information Processing Letters
Fuzzy clustering with weighting of data variables

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems - special issue on measures and aggregation: formal aspects and applications to clustering and decision
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern high-throughput technologies allow the systematic characterisation of an organism but provide excessive amounts of data such as results from microarray gene expression experiments. Combining the information from various experiments will help to expand the knowledge about an organism. However, the analysis of a data set comprising measurements for thousands of genes under many conditions, requires efficient techniques to be feasible at all. Here, we refine a frequent itemset mining approach for scanning a high-throughput data set in order to identify subsets of genes and subsets of conditions with similar data patterns. As a use case, screenings of 4699 mutant clones of Pseudomonas aeruginosa each with a disrupted gene were considered under 109 conditions. We found an unexpected gene group with highly overlapping phenotypes. Therefore our approach is suitable to simultaneously find objects with similar pattern in high-dimensional data sets and their key characteristics within reasonable time.