Capturing best practice for microarray gene expression data analysis

Authors:
Gregory Piatetsky-Shapiro;Tom Khabaza;Sridhar Ramaswamy
Affiliations:
KDnuggets;SPSS;MIT/Whitehead Institute
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 1
Cited 6

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance

Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Integrative Visual Data Mining of Biomedical Data: Investigating Cases in Chronic Fatigue Syndrome and Acute Lymphoblastic Leukaemia

Visual Data Mining
A Framework for Multi-class Learning in Micro-array Data Analysis

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Capturing heuristics and intelligent methods for improving micro-array data classification

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Gene selection by cooperative competition clustering

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Integrating machine learning techniques into robust data enrichment approach and its application to gene expression data

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analyzing gene expression data from microarray devices has many important application in medicine and biology, but presents significant challenges to data mining. Microarray data typically has many attributes (genes) and few examples (samples), making the process of correctly analyzing such data difficult to formulate and prone to common mistakes. For this reason it is unusually important to capture and record good practices for this form of data mining. This paper presents a process for analyzing microarray data, including pre-processing, gene selection, randomization testing, classification and clustering; this process is captured with "Clementine Application Templates". The paper describes the process in detail and includes three case studies, showing how the process is applied to 2-class classification, multi-class classification and clustering analyses for publicly available microarray datasets.