Dimension reduction techniques and the classification of bent double galaxies

Authors:
Imola K. Fodor;Chandrika Kamath
Affiliations:
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, P.O. BOX 808 L-560, Livermore, CA 94551, USA;Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, P.O. BOX 808 L-560, Livermore, CA 94551, USA
Venue:
Computational Statistics & Data Analysis
Year:
2002

Citing 7
Cited 2

Applications of machine learning and rule induction

Communications of the ACM
The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
Learning to Recognize Volcanoes on Venus

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Induction of Decision Trees

Machine Learning
Feature Selection Via Mathematical Programming

INFORMS Journal on Computing
On growing better decision trees from data

On growing better decision trees from data
Parallel coordinates: a tool for visualizing multi-dimensional geometry

VIS '90 Proceedings of the 1st conference on Visualization '90

An optimal set of uncorrelated margin discriminant vector

ICNC'09 Proceedings of the 5th international conference on Natural computation
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.03

Visualization

Abstract

As data mining gains acceptance in the analysis of massive data sets, it is becoming clear that there is a need for algorithms that can handle not only the massive size, but also the high dimensionality of the data. Certain pattern recognition algorithms can become computationally intractable when the number of features reaches hundreds or even thousands, while others can break down if there are large correlations among the features. A common solution to these problems is to reduce the dimension, either in conjunction with the pattern recognition algorithm or independent of it. We describe how dimension reduction techniques can be applied in the context of a specific data mining application, namely, the classification of radio-galaxies with a bent double morphology. We discuss certain statistical and exploratory data analysis methods to reduce the number of features, and the subsequent improvements in the performance of decision tree and generalized linear model classifiers. We show that a careful extraction and selection of features is necessary for the successful application of data mining techniques.