Effect of feature-type in selecting distance measure for an artificial immune system as a pattern recognizer

Authors:
Seral Özşen;Salih Güneş
Affiliations:
Electrical and Electronics Engineering Department, Selcuk University, 42035 Konya, Turkey;Electrical and Electronics Engineering Department, Selcuk University, 42035 Konya, Turkey
Venue:
Digital Signal Processing
Year:
2008

Citing 3
Cited 2

Artificial Immune Systems: A New Computational Intelligence Paradigm

Artificial Immune Systems: A New Computational Intelligence Paradigm
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Predicting breast cancer survivability: a comparison of three data mining methods

Artificial Intelligence in Medicine

A new feature selection method on classification of medical datasets: Kernel F-score feature selection

Expert Systems with Applications: An International Journal
A novel distance measure for data vectors with nominal feature values

ECS'10/ECCTD'10/ECCOM'10/ECCS'10 Proceedings of the European conference of systems, and European conference of circuits technology and devices, and European conference of communications, and European conference on Computer science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In designing an artificial immune system (AIS) for a problem domain, one must select a distance measure to find the affinity between system units and input data after determining a representation type. Euclidean distance is a commonly used distance measure in many proposed methods and is selected intuitively or due to simplicity of implementation. But this selection must be done carefully by considering the properties of problem domain. For example, most problems use data vectors with discrete, real-valued and nominal feature values. Whereas Euclidean distance can be used in this kind of problems, some other similarity measures designed for these hybrid vectors would give better results. To call attention of AIS designer to this point, we have tested three distance criteria which are Euclidean distance, Manhattan distance, and hybrid similarity measure on a simple AIS for the classification of two medical dataset taken from the UCI machine learning repository. One of the datasets, Statlog heart disease, contains nominal, discrete and real-valued vectors while the other one, BUPA liver disorders dataset, consists of purely real-valued vectors. For Statlog dataset, the best classification result was obtained with hybrid similarity measure as expected because this dataset consists of three-types of features while results for BUPA dataset were not different so much for the used measures, which is also an expected result considering the nature of this dataset.