Relevant attribute discovery in high dimensional data: application to breast cancer gene expressions

Authors:
Julio J. Valdés;Alan J. Barton
Affiliations:
National Research Council Canada, M50, Ottawa;National Research Council Canada, M50, Ottawa
Venue:
RSKT'06 Proceedings of the First international conference on Rough Sets and Knowledge Technology
Year:
2006

Citing 5
Cited 4

Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Clustering Algorithms

Clustering Algorithms
Gene discovery in leukemia revisited: a computational intelligence perspective

IEA/AIE'2004 Proceedings of the 17th international conference on Innovations in applied artificial intelligence
Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (Studies in Computational Intelligence)

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence (Studies in Computational Intelligence)
Virtual reality representation of information systems and decision rules: an exploratory technique for understanding data and knowledge structure

RSFDGrC'03 Proceedings of the 9th international conference on Rough sets, fuzzy sets, data mining, and granular computing

Degrees of conditional (in)dependence: A framework for approximate Bayesian networks and examples related to the rough set-based feature selection

Information Sciences: an International Journal
Rough set theory with discriminant analysis in analyzing electricity loads

Expert Systems with Applications: An International Journal
Roughfication of numeric decision tables: the case study of gene expression data

RSKT'07 Proceedings of the 2nd international conference on Rough sets and knowledge technology
Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many domains, the data objects are described in terms of a large number of features. The pipelined data mining approach introduced in [1] using two clustering algorithms in combination with rough sets and extended with genetic programming, is investigated with the purpose of discovering important subsets of attributes in high dimensional data. Their classification ability is described in terms of both collections of rules and analytic functions obtained by genetic programming (gene expression programming). The Leader and several k-means algorithms are used as procedures for attribute set simplification of the information systems later presented to rough sets algorithms. Visual data mining techniques including virtual reality were used for inspecting results. The data mining process is setup using high throughput distributed computing techniques. This approach was applied to Breast Cancer microarray data and it led to subsets of genes with high discrimination power with respect to the decision classes