A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Authors:
Scott Cost;Steven Salzberg
Affiliations:
cost@cs.jhu.edu;Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218. salzberg@cs.jhu.edu
Venue:
Machine Learning
Year:
1993

Citing 0
Cited 174

Experiments on multistrategy learning by meta-learning

CIKM '93 Proceedings of the second international conference on Information and knowledge management
On reasoning from data

ACM Computing Surveys (CSUR)
Using genetic algorithms to inductively reason with cases in the legal domain

ICAIL '95 Proceedings of the 5th international conference on Artificial intelligence and law
Exploring the Power of Genetic Search in Learning Symbolic Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the Accuracy of Meta-learning for Scalable Data Mining

Journal of Intelligent Information Systems
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Discretisation in Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
Intelligent Selection of Instances for Prediction Functions in LazyLearning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
Control-Sensitive Feature Selection for Lazy Learners

Artificial Intelligence Review - Special issue on lazy learning
Computing Optimal Attribute Weight Settings for Nearest NeighborAlgorithms

Artificial Intelligence Review - Special issue on lazy learning
A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
Locality-preserving hashing in multidimensional spaces

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Learning and Revising User Profiles: The Identification ofInteresting Web Sites

Machine Learning - Special issue on multistrategy learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Predicting Protein Secondary Structure Using Stochastic Tree Grammars

Machine Learning - Special issue on learning with probabilistic representations
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Knowledge intensive exception spaces

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Tighter bounds for nearest neighbor search and related problems in the cell probe model

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Textual Data Mining to Support Science and Technology Management

Journal of Intelligent Information Systems
The Supervised Network Self-Organizing Map for Classification of Large Data Sets

Applied Intelligence
Generalized Radial Basis Function Networks Trained with Instance Based Learning for Data Mining of Symbolic Data

Applied Intelligence
User Modeling for Personalized City Tours

Artificial Intelligence Review
Using Correspondence Analysis to Combine Classifiers

Machine Learning
Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Experience with Rule Induction and k-Nearest Neighbor Methods for Interface Agents that Learn

IEEE Transactions on Knowledge and Data Engineering
Best-Case Results for Nearest-Neighbor Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Subset Selection Using a Genetic Algorithm

IEEE Intelligent Systems
A Taxonomy of Recommender Agents on theInternet

Artificial Intelligence Review
Toward an Ecplanatory Similarity Measure for Nearest-Neighbor Classification

ECML '00 Proceedings of the 11th European Conference on Machine Learning
A Language-Based Similarity Measure

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
RIONA: A Classifier Combining Rule Induction and k-NN Method with Automated Selection of Optimal Neighbourhood

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Non-parametric Nearest Neighbor with Local Adaptation

EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Dynamic Integration of Decision Committees

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Bagging and Boosting with Dynamic Integration of Classifiers

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Ensemble Feature Selection Based on the Contextual Merit

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Efficient Similarity Determination and Case Construction Techniques for Case-Based Reasoning

ECCBR '02 Proceedings of the 6th European Conference on Advances in Case-Based Reasoning
Prototype Generation Based on Instance Filtering and Averaging

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Boosting the Performance of Nearest Neighbour Methods with Feature Selection

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
An Empirical Study of a New Approach to Nearest Neighbor Searching

ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Stacking for Misclassification Cost Performance

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Probability Based Metrics for Nearest Neighbor Classification and Case-Based Reasoning

ICCBR '99 Proceedings of the Third International Conference on Case-Based Reasoning and Development
Decision Committee Learning with Dynamic Integration of Classifiers

ADBIS-DASFAA '00 Proceedings of the East-European Conference on Advances in Databases and Information Systems Held Jointly with International Conference on Database Systems for Advanced Applications: Current Issues in Databases and Information Systems
Data mining tasks and methods: Classification: nearest-neighbor approaches

Handbook of data mining and knowledge discovery
Case studies: Public domain, multiple mining tasks systems: MLC++

Handbook of data mining and knowledge discovery
Tighter lower bounds for nearest neighbor search and related problems in the cell probe model

Journal of Computer and System Sciences - Special issue on STOC 2000
Advanced Local Feature Selection in Medical Diagnostics

CBMS '00 Proceedings of the 13th IEEE Symposium on Computer-Based Medical Systems (CBMS'00)
Prioritised fuzzy constraint satisfaction problems: axioms, instantiation and validation

Fuzzy Sets and Systems - Theme: Multicriteria decision
An adaptive nearest neighbor search for a parts acquisition ePortal

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Nearest-Neighbours for Time Series

Applied Intelligence
Center-based indexing in vector and metric spaces

Fundamenta Informaticae
The interaction of knowledge sources in word sense disambiguation

Computational Linguistics
Parameter optimization for machine-learning of word sense disambiguation

Natural Language Engineering
Multilingual generation and summarization of job adverts: the TREE project

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Personalised hypermedia presentation techniques for improving online customer relationships

The Knowledge Engineering Review
Retrieval strategies for case-based reasoning: a categorised bibliography

The Knowledge Engineering Review
Memory-based learning: using similarity for smoothing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Unsupervised discovery of phonological categories through supervised learning of morphological rules

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A strong lower bound for approximate nearest neighbor searching

Information Processing Letters
Mobile services discovery and selection in the publish/subscribe paradigm

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Classification and clustering for case-based criminal summary judgments

ICAIL '03 Proceedings of the 9th international conference on Artificial intelligence and law
Characterisation of a Novel Indexing Technique for Case-Based Reasoning

Artificial Intelligence Review
Shallow parsing on the basis of words only: a case study

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Neural vs. statistical classifier in conjunction with genetic algorithm based feature selection

Pattern Recognition Letters
Learning distributed linguistic classes

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Single-classifier memory-based phrase chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
An empirical study of the domain dependence of supervised word sense disambiguation systems

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Dutch word sense disambiguation: optimizing the localness of context

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Question terminology and representation for question type classification

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
Nearest Neighbors by Neighborhood Counting

IEEE Transactions on Pattern Analysis and Machine Intelligence
Convex Kernel Underestimation of Functions with Multiple Local Minima

Computational Optimization and Applications
Deterministic dependency parsing of English text

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning

Fundamenta Informaticae
A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set

Pattern Recognition Letters
Intelligent index selection for case-based reasoning

Knowledge-Based Systems
Parameter optimized, vertical, nearest-neighbor-vote and boundary-based classification

ACM SIGKDD Explorations Newsletter
New Algorithms for Efficient High-Dimensional Nonparametric Classification

The Journal of Machine Learning Research
Local similarity discriminant analysis

Proceedings of the 24th international conference on Machine learning
Authentic facial expression analysis

Image and Vision Computing
Short communication: Psychology with soft computing: An integrated approach and its applications

Applied Soft Computing
A comprehensive review of recursive Naïve Bayes Classifiers

Intelligent Data Analysis
Weighted Instance Typicality Search (WITS): A nearest neighbor data reduction algorithm

Intelligent Data Analysis
Setting attribute weights for k-NN based binary classification via quadratic programming

Intelligent Data Analysis
Symbolic adaptive neuro-fuzzy inference for data mining of heterogenous data

Intelligent Data Analysis
Incorporating user control into recommender systems based on naive bayesian classification

Proceedings of the 2007 ACM conference on Recommender systems
Similarity search for web services

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Distance functions for categorical and mixed variables

Pattern Recognition Letters
An efficient weighted nearest neighbour classifier using vertical data representation

International Journal of Business Intelligence and Data Mining
Generative models for similarity-based classification

Pattern Recognition
Feature Extraction for Dynamic Integration of Classifiers

Fundamenta Informaticae
Decomposable algorithms for nearest neighbor computing

Journal of Parallel and Distributed Computing
Locally linear reconstruction for instance-based learning

Pattern Recognition
Local distance-based classification

Knowledge-Based Systems
Hierarchical Text Categorization Through a Vertical Composition of Classifiers

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Behavioral Cloning for Simulator Validation

RoboCup 2007: Robot Soccer World Cup XI
Towards Heterogeneous Similarity Function Learning for the k-Nearest Neighbors Classification

ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
A personalized counseling system using case-based reasoning with neural symbolic feature weighting (CANSY)

Applied Intelligence
Ranking-order case-based reasoning for financial distress prediction

Knowledge-Based Systems
Gaussian case-based reasoning for business failure prediction with empirical data in China

Information Sciences: an International Journal
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Methods and algorithms of collective recognition

Automation and Remote Control
Beta Random Projection

Bio-Inspired Computing and Communication
A method for improving the accuracy of data mining classification algorithms

Computers and Operations Research
Clustering with Domain Value Dissimilarity for Categorical Data

ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Do not forget: full memory in memory-based learning of word pronunciation

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Learning the scope of negation in biomedical texts

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Stacked generalization: when does it work?

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Wrap-Up: a trainable discourse module for information extraction

Journal of Artificial Intelligence Research
Issues in stacked generalization

Journal of Artificial Intelligence Research
A weighted polynomial information gain kernel for resolving prepositional phrase attachment ambiguities with support vector machines

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Rule induction and instance-based learning a unified approach

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
InstanceRank: Bringing order to datasets

Pattern Recognition Letters
Nearest prototype classification of noisy data

Artificial Intelligence Review
A new design method for linguistically understandable fuzzy classifier

FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Help desk architecture for a u-learning system characterized by the use of probabilistic reasoning-based search of a case base

WBE '08 Proceedings of the Seventh IASTED International Conference on Web-based Education
On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction

Expert Systems with Applications: An International Journal
Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination

The Journal of Machine Learning Research
Efficient real time maintenance of retrieval knowledge in case-based reasoning

ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Empirical analysis of case-based reasoning and other prediction methods in a social science domain: repeat criminal victimization

ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Empirical evaluation of the difficulty of finding a good value of k for the nearest neighbor

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
Interactive Japanese-to-braille translation using case-based knowledge on the web

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
A similarity evaluation technique for cooperative problem solving with a group of agents

CIA'99 Proceedings of the 3rd international conference on Cooperative information agents III
Feature interval learning algorithms for classification

Knowledge-Based Systems
Data compression by volume prototypes for streaming data

Pattern Recognition
Focusing solutions for data mining: analytical studies and experimental results in real-world domains

Focusing solutions for data mining: analytical studies and experimental results in real-world domains
Using knowledge-based neural networks to improve algorithms: refining the Chou-Fasman algorithm for protein folding

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Probabilistic prediction of protein secondary structure using causal networks

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Symbolic nearest mean classifiers

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
A new supervised learning algorithm for word sense disambiguation

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
A new paradigm of ranking & searching in learning object repository

Proceedings of the second ACM international workshop on Multimedia technologies for distance leaning
An efficient mechanism for processing similarity search queries in sensor networks

Information Sciences: an International Journal
Class imbalance methods for translation initiation site recognition

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Surrounding influenced K-nearest neighbors: a new distance based classifier

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Authentic facial expression analysis

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
A comparative experimental assessment of a threshold selection algorithm in hierarchical text categorization

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Knowledge source confidence measure applied to a rule-based recognition system

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Distributed learning with data reduction

Transactions on computational collective intelligence IV
Class imbalance methods for translation initiation site recognition in DNA sequences

Knowledge-Based Systems
Parameter tuning, feature selection and weight assignment of features for case-based reasoning by artificial immune system

Applied Soft Computing
A comparative study of thresholding strategies in progressive filtering

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Coupled nominal similarity in unsupervised learning

Proceedings of the 20th ACM international conference on Information and knowledge management
Multiagent systems and information retrieval our experience with X.MAS

Expert Systems with Applications: An International Journal
A nearest features classifier using a self-organizing map for memory base evaluation

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Decision tree induction with CBR

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Feature-Weighted CBR with neural network for symbolic features

ICIC'06 Proceedings of the 2006 international conference on Intelligent Computing - Volume Part I
An effective method for locally neighborhood graphs updating

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Hybrid system of case-based reasoning and neural network for symbolic features

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
A hierarchical approach to multimodal classification

RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part II
Hybrid algorithms with instance-based classification

ECML'05 Proceedings of the 16th European conference on Machine Learning
Analogy-based reasoning in classifier construction

Transactions on Rough Sets IV
Multimodal classification: case studies

Transactions on Rough Sets V
Learning user preferences in distributed calendar scheduling

PATAT'04 Proceedings of the 5th international conference on Practice and Theory of Automated Timetabling
Reuse frequency as metric for dependency resolver selection

CD'05 Proceedings of the Third international working conference on Component Deployment
A hybrid classical approach to a fixed-charged transportation problem

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
K Nearest Neighbor Equality: Giving equal chance to all existing classes

Information Sciences: an International Journal
Application of classification algorithms on IDDM rat data

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Feature Extraction for Dynamic Integration of Classifiers

Fundamenta Informaticae
Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Center-Based Indexing in Vector and Metric Spaces

Fundamenta Informaticae
RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning

Fundamenta Informaticae
Combining classifiers under probabilistic models: experimental comparative analysis of methods

Expert Systems: The Journal of Knowledge Engineering
Case based time series prediction using biased time warp distance for electrical evoked potential forecasting in visual prostheses

Applied Soft Computing
An Augmented Value Difference Measure

Pattern Recognition Letters
SPOT5 multi-spectral (MS) and panchromatic (PAN) image fusion using an improved wavelet method based on local algorithm

Computers & Geosciences
A similarity-based approach for data stream classification

Expert Systems with Applications: An International Journal
IIvotes ensemble for imbalanced data

Intelligent Data Analysis - Combined Learning Methods and Mining Complex Data
Mutual information evaluation: A way to predict the performance of feature weighting on clustering

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce real-valued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior in all three domains. In addition, our algorithm has advantages in training speed, simplicity, and perspicuity. We conclude that experimental evidence favors the use and continued development of nearest neighbor algorithms for domains such as the ones studied here.