Advances in Instance Selection for Instance-Based Learning Algorithms

Authors:
Henry Brighton;Chris Mellish
Affiliations:
Language Evolution and Computation Research Unit, Department of Theoretical and Applied Linguistics, The University of Edinburgh, Edinburgh, EH8 9LL, UK. henryb@ling.ed.ac.uk;Department of Artificial Intelligence, The University of Edinburgh, Edinburgh EH1 1HN, UK. chrism@dai.ed.ac.uk
Venue:
Data Mining and Knowledge Discovery
Year:
2002

Citing 12
Cited 102

Instance-Based Learning Algorithms

Machine Learning
A Nearest Hyperrectangle Learning Method

Machine Learning
Selecting typical instances in instance-based learning

ML92 Proceedings of the ninth international workshop on Machine learning
Information Filtering: Selection Mechanisms in Learning Systems

Machine Learning
Case-based reasoning

Case-based reasoning
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On the Consistency of Information Filters for Lazy Learning Algorithms

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Selection and Statistical Validation of Features and Prototypes

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Do not forget: full memory in memory-based learning of word pronunciation

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Remembering to forget: a competence-preserving case deletion policy for case-based reasoning systems

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Rule induction and instance-based learning a unified approach

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

On Issues of Instance Selection

Data Mining and Knowledge Discovery
On Filtering the Training Prototypes in Nearest Neighbour Classification

CCIA '02 Proceedings of the 5th Catalonian Conference on AI: Topics in Artificial Intelligence
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
ELA—A new Approach for Learning Agents

Autonomous Agents and Multi-Agent Systems
The Nearest Subclass Classifier: A Compromise between the Nearest Mean and Nearest Neighbor Classifier

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast condensed nearest neighbor rule

ICML '05 Proceedings of the 22nd international conference on Machine learning
An Assessment of Case-Based Reasoning for Spam Filtering

Artificial Intelligence Review
Combining Feature Reduction and Case Selection in Building CBR Classifiers

IEEE Transactions on Knowledge and Data Engineering
Local bagging of decision stumps

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Local averaging of heterogeneous regression models

International Journal of Hybrid Intelligent Systems
On k-NN Method with Preprocessing

Fundamenta Informaticae
PointMap: A Real-Time Memory-Based Learning System with On-line and Post-Training Pruning

International Journal of Hybrid Intelligent Systems
Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies

The Journal of Machine Learning Research
Distributed Nearest Neighbor-Based Condensation of Very Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
Fast Nearest Neighbor Condensation for Large Data Sets Classification

IEEE Transactions on Knowledge and Data Engineering
Textual case-based reasoning for spam filtering: a comparison of feature-based and feature-free approaches

Artificial Intelligence Review
Machine learning: a review of classification and combining techniques

Artificial Intelligence Review
A novel Supervised Instance Selection algorithm

International Journal of Business Intelligence and Data Mining
A memetic algorithm for evolutionary prototype selection: A scaling up approach

Pattern Recognition
Spiral removal of exceptional patients for mining chronic hepatitis data

New Generation Computing
An association-based case reduction technique for case-based reasoning

Information Sciences: an International Journal
Hit Miss Networks with Applications to Instance Selection

The Journal of Machine Learning Research
Incremental exemplar learning schemes for classification on embedded devices

Machine Learning
Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection

Expert Systems with Applications: An International Journal
Clustering-Based Reference Set Reduction for k-Nearest Neighbor

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
An Adaptive Michigan Approach PSO for Nearest Prototype Classification

IWINAC '07 Proceedings of the 2nd international work-conference on Nature Inspired Problem-Solving Methods in Knowledge Engineering: Interplay Between Natural and Artificial Computation, Part II
Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Prototype Selection Via Prototype Relevance

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
Local reweight wrapper for the problem of imbalance

International Journal of Artificial Intelligence and Soft Computing
A search space reduction methodology for data mining in large databases

Engineering Applications of Artificial Intelligence
Locally application of cascade generalization for classification problems

Intelligent Decision Technologies
A divide-and-conquer recursive approach for scaling up instance selection algorithms

Data Mining and Knowledge Discovery
Rough-fuzzy weighted k-nearest leader classifier for large data sets

Pattern Recognition
The Good, the Bad and the Incorrectly Classified: Profiling Cases for Case-Base Editing

ICCBR '09 Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
A Scalable Noise Reduction Technique for Large Case-Based Systems

ICCBR '09 Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Prototype selection based on sequential search

Intelligent Data Analysis
Instance Selection by Border Sampling in Multi-class Domains

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Graph-Based Discrete Differential Geometry for Critical Instance Filtering

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Complexity-guided case discovery for case based reasoning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1
On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining

Applied Soft Computing
AMPSO: a new particle swarm method for nearest neighborhood classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Constructing ensembles of classifiers by means of weighted instance selection

IEEE Transactions on Neural Networks
A novel template reduction approach for the K-nearest neighbor method

IEEE Transactions on Neural Networks
InstanceRank: Bringing order to datasets

Pattern Recognition Letters
PixGeo: Geographically Grounding Touristic Personal Photographs

SAMT '09 Proceedings of the 4th International Conference on Semantic and Digital Media Technologies: Semantic Multimedia
Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts

Artificial Intelligence
IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule

Pattern Recognition
A search space reduction methodology for large databases: a case study

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Mixed data object selection based on clustering and border objects

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
SCIS: combining instance selection methods to increase their effectiveness over a wide range of domains

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Feature interval learning algorithms for classification

Knowledge-Based Systems
A review of instance selection methods

Artificial Intelligence Review
Noise reduction for instance-based learning with a local maximal margin approach

Journal of Intelligent Information Systems
Local rotation-based ensemble

International Journal of Knowledge Engineering and Data Mining
Large scale instance selection by means of a parallel algorithm

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Improving accuracy of LVQ algorithm by instance weighting

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
A class boundary preserving algorithm for data condensation

Pattern Recognition
Pruning classification rules with reference vector selection methods

ICAISC'10 Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I
Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification

Pattern Recognition
Adaptive case-based reasoning using retention and forgetting strategies

Knowledge-Based Systems
Multi-class leveraged k-NN for image classification

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Combining instance selection methods based on data characterization: An approach to increase their effectiveness

Information Sciences: an International Journal
An instance selection algorithm based on reverse nearest neighbor

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
INSIGHT: efficient and effective instance selection for time-series classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
A comparison of two strategies for scaling up instance selection in huge datasets

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Instance selection for class imbalanced problems by means of selecting instances more than once

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Predictive-collaborative model as recovery and validation tool. Case of study: Psychiatric emergency department decision support

Expert Systems with Applications: An International Journal
Edition schemes based on BSE

CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Geometric decision rules for instance-based learning problems

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Local additive regression of decision stumps

SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Leveraging k-NN for generic classification boosting

Neurocomputing
Multi-strategy instance selection in mining chronic hepatitis data

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Noisy data elimination using mutual k-nearest neighbor for classification mining

Journal of Systems and Software
Multi-represented kNN-classification for large class sets

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Complexity profiling for informed case-base editing

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Instance selection in text classification using the silhouette coefficient measure

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Profiling instances in noise reduction

Knowledge-Based Systems
Editorial: Large scale instance selection by means of federal instance selection

Data & Knowledge Engineering
2011 Special Issue: LVQ algorithm with instance weighting for generation of prototype-based rules

Neural Networks
Proximity-graph instance-based learning, support vector machines, and high dimensionality: an empirical comparison

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
On k-NN Method with Preprocessing

Fundamenta Informaticae
Multi-selection of instances: A straightforward way to improve evolutionary instance selection

Applied Soft Computing
InstanceRank based on borders for instance selection

Pattern Recognition
Efficient dataset size reduction by finding homogeneous clusters

Proceedings of the Fifth Balkan Conference in Informatics
Boosting k-NN for Categorization of Natural Scenes

International Journal of Computer Vision
Representative prototype sets for data characterization and classification

ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
On instance selection in audio based emotion recognition

ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
A scalable approach to simultaneous evolutionary instance and feature selection

Information Sciences: an International Journal
FRPS: A Fuzzy Rough Prototype Selection method

Pattern Recognition
Instance selection for time series classification based on immune binary particle swarm optimization

Knowledge-Based Systems
An automated search space reduction methodology for large databases

ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
SVOIS: Support Vector Oriented Instance Selection for text classification

Information Systems
jcolibri2: A framework for building Case-based reasoning systems

Science of Computer Programming
Review: Educational data mining: A survey and a data mining-based analysis of recent works

Expert Systems with Applications: An International Journal
Prototype reduction based on Direct Weighted Pruning

Pattern Recognition Letters
Linear reconstruction measure steered nearest neighbor classification framework

Pattern Recognition
Identifying predictive hubs to condense the training set of $$k$$-nearest neighbour classifiers

Computational Statistics
Concept drift detection via competence models

Artificial Intelligence
Evolutionary instance selection for text classification

Journal of Systems and Software
On the use of meta-learning for instance selection: An architecture and an experimental study

Information Sciences: an International Journal
A hybrid decision tree classifier

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Integrating global and local application of random subspace ensemble

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant work carried out in the past 30 years, we review the principle approaches to solving these problems. By deleting instances, both problems can be alleviated, but the criterion used is typically assumed to be all encompassing and effective over many domains. We argue against this position and introduce an algorithm that rivals the most successful existing algorithm. When evaluated on 30 different problems, neither algorithm consistently outperforms the other: consistency is very hard. To achieve the best results, we need to develop mechanisms that provide insights into the structure of class definitions. We discuss the possibility of these mechanisms and propose some initial measures that could be useful for the data miner.