A scatter method for data and variable importance evaluation

Authors:
Martti Juhola;v. Siermala
Affiliations:
Computer Science, School of Information Sciences, 33014 University of Tampere, Finland;Computer Science, School of Information Sciences, 33014 University of Tampere, Finland
Venue:
Integrated Computer-Aided Engineering
Year:
2012

Citing 26
Cited 2

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Data preparation for data mining

Data preparation for data mining
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Principles of data mining

Principles of data mining
A Formalism for Relevance and Its Application in Feature Subset Selection

Machine Learning
Evaluating Training Data Suitability for Decision Tree Induction

Journal of Medical Systems
An introduction to variable and feature selection

The Journal of Machine Learning Research
Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Overfitting in making comparisons between variable selection methods

The Journal of Machine Learning Research
Ranking a random feature for variable and feature selection

The Journal of Machine Learning Research
Feature extraction by non parametric mutual information maximization

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Input Variable Selection: Mutual Information and Linear Mixing Measures

IEEE Transactions on Knowledge and Data Engineering
Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

The Journal of Machine Learning Research
Evaluation and classification of otoneurological data with new data analysis methods based on machine learning

Information Sciences: an International Journal
Using results of eye movement signal analysis in the neural network recognition of otoneurological patients

Computer Methods and Programs in Biomedicine
A genetic feature weighting scheme for pattern recognition

Integrated Computer-Aided Engineering
Integrating a mixed-feature model and multiclass support vector machine for facial expression recognition

Integrated Computer-Aided Engineering
Hybrid sampling for imbalanced data

Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
An efficient fingerprint image compression technique based on wave atoms decomposition and multistage vector quantization

Integrated Computer-Aided Engineering
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes

Data Mining and Knowledge Discovery
COG: local decomposition for rare class analysis

Data Mining and Knowledge Discovery
A co-classification approach to learning from multilingual corpora

Machine Learning
Enhanced probabilistic neural network with local decision circles: A robust classifier

Integrated Computer-Aided Engineering

Identification of anatomic retinal structures for macular delineation in fluorescein angiograms

Integrated Computer-Aided Engineering
Sharing hardware resources in heterogeneous computer-supported collaboration scenarios

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We designed an algorithm in order to examine the importance of variables in data sets for variable evaluation and weighting. In particular, it is designated for the evaluation whether a data set includes such information that is useful for the separation of classes in classification and prediction. Such an evaluation can be performed for an entire data set or separately classes or variables. The scatter method is based on traversing through a data set as near neighbour cases and counting class changes, i.e., when the classes of near cases are changed. The fewer the changes, the more compact the classes are in a variable space so that they are possible to separate with high classification accuracy. We tested the method with different data sets of medical origin. Their results showed that the scatter method can be used to explore how separable the classes in these data sets were. This is useful for variable evaluation and weighting.