Applications of machine learning and rule induction
Communications of the ACM
The KDD process for extracting useful knowledge from volumes of data
Communications of the ACM
Learning to Recognize Volcanoes on Venus
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Machine Learning
Feature Selection Via Mathematical Programming
INFORMS Journal on Computing
On growing better decision trees from data
On growing better decision trees from data
Parallel coordinates: a tool for visualizing multi-dimensional geometry
VIS '90 Proceedings of the 1st conference on Visualization '90
An optimal set of uncorrelated margin discriminant vector
ICNC'09 Proceedings of the 5th international conference on Natural computation
Transforming graph data for statistical relational learning
Journal of Artificial Intelligence Research
Hi-index | 0.03 |
As data mining gains acceptance in the analysis of massive data sets, it is becoming clear that there is a need for algorithms that can handle not only the massive size, but also the high dimensionality of the data. Certain pattern recognition algorithms can become computationally intractable when the number of features reaches hundreds or even thousands, while others can break down if there are large correlations among the features. A common solution to these problems is to reduce the dimension, either in conjunction with the pattern recognition algorithm or independent of it. We describe how dimension reduction techniques can be applied in the context of a specific data mining application, namely, the classification of radio-galaxies with a bent double morphology. We discuss certain statistical and exploratory data analysis methods to reduce the number of features, and the subsequent improvements in the performance of decision tree and generalized linear model classifiers. We show that a careful extraction and selection of features is necessary for the successful application of data mining techniques.