Hit Miss Networks with Applications to Instance Selection

Authors:
Elena Marchiori
Affiliations:
-
Venue:
The Journal of Machine Learning Research
Year:
2008

Citing 19
Cited 11

Instance-Based Learning Algorithms

Machine Learning
Prototype selection for the nearest neighbour rule through proximity graphs

Pattern Recognition Letters
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Soft Margins for AdaBoost

Machine Learning
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On the Consistency of Information Filters for Lazy Learning Algorithms

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Separability Index in Supervised Learning

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Reference Set Thinning for the k-Nearest Neighbor Decision Rule

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1
Application of computational geometry to pattern recognition problems

Application of computational geometry to pattern recognition problems
A new family of proximity graphs: class cover catch digraphs

Discrete Applied Mathematics
Evolution of Networks: From Biological Nets to the Internet and WWW (Physics)

Evolution of Networks: From Biological Nets to the Internet and WWW (Physics)
Prototype selection for dissimilarity-based classifiers

Pattern Recognition
A fast all nearest neighbor algorithm for applications involving large point-clouds

Computers and Graphics
Neighborhood Property--Based Pattern Selection for Support Vector Machines

Neural Computation
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Fast Nearest Neighbor Condensation for Large Data Sets Classification

IEEE Transactions on Knowledge and Data Engineering
Avoiding Boosting Overfitting by Removing Confusing Samples

ECML '07 Proceedings of the 18th European conference on Machine Learning
Geometric decision rules for instance-based learning problems

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence

Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems

Applied Soft Computing
Graph-Based Discrete Differential Geometry for Critical Instance Filtering

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule

Pattern Recognition
A class boundary preserving algorithm for data condensation

Pattern Recognition
Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification

Pattern Recognition
A novel two-stage phased modeling framework for early fraud detection in online auctions

Expert Systems with Applications: An International Journal
An instance selection algorithm based on reverse nearest neighbor

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Profiling instances in noise reduction

Knowledge-Based Systems
FRPS: A Fuzzy Rough Prototype Selection method

Pattern Recognition
ATISA: Adaptive Threshold-based Instance Selection Algorithm

Expert Systems with Applications: An International Journal
Prototype reduction based on Direct Weighted Pruning

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. We show that structural properties of HMN's correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN's for improving the performance of a 1-NN, classifier by removing instances from the training set (instance selection). We introduce three new HMN-based algorithms for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 data sets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise content, and presence of redundant variables. Results of experiments on these data sets show that accuracy of 1-NN classifier increases significantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. In general, these results indicate that HMN's provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning.