Finding relevant attributes in high dimensional data: a distributed computing hybrid data mining strategy

Authors:
Julio J. Valdés;Alan J. Barton
Affiliations:
National Research Council Canada, Ottawa, ON;National Research Council Canada, Ottawa, ON
Venue:
Transactions on rough sets VI
Year:
2007

Citing 16
Cited 0

Numerical recipes: the art of scientific computing

Numerical recipes: the art of scientific computing
Multidimensional similarity structure analysis

Multidimensional similarity structure analysis
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
A worldwide flock of Condors: load sharing among workstation clusters

Future Generation Computer Systems - Special issue: resource management in distributed systems
Condor: a distributed job scheduler

Beowulf cluster computing with Linux
Unsupervised Rough Set Classification Using GAs

Journal of Intelligent Information Systems
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Clustering Algorithms

Clustering Algorithms
Dynamic Reducts as a Tool for Extracting Laws from Decisions Tables

ISMIS '94 Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems
Time Complexity of Rough Clustering: GAs versus K-Means

TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications
Gene discovery in leukemia revisited: a computational intelligence perspective

IEA/AIE'2004 Proceedings of the 17th international conference on Innovations in applied artificial intelligence
Ensembles of Classifiers Based on Approximate Reducts

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P'2000)
A Nonlinear Mapping for Data Structure Analysis

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many domains the data objects are described in terms of a large number of features (e.g. microarray experiments, or spectral characterizations of organic and inorganic samples). A pipelined approach using two clustering algorithms in combination with Rough Sets is investigated for the purpose of discovering important combinations of attributes in high dimensional data. The Leader and several k-means algorithms are used as fast procedures for attribute set simplification of the information systems presented to the rough sets algorithms. The data described in terms of these fewer features are then discretized with respect to the decision attribute according to different rough set based schemes. From them, the reducts and their derived rules are extracted, which are applied to test data in order to evaluate the resulting classification accuracy in crossvalidation experiments. The data mining process is implemented within a high throughput distributed computing environment. Nonlinear transformation of attribute subsets preserving the similarity structure of the data were also investigated. Their classification ability, and that of subsets of attributes obtained after the mining process were described in terms of analytic functions obtained by genetic programming (gene expression programming), and simplified using computer algebra systems. Visual data mining techniques using virtual reality were used for inspecting results. An exploration of this approach (using Leukemia, Colon cancer and Breast cancer gene expression data) was conducted in a series of experiments. They led to small subsets of genes with high discrimination power.