Using clustering to learn distance functions for supervised similarity assessment

Authors:
Christoph F. Eick;Alain Rouhana;A. Bagherjeiran;R. Vilalta
Affiliations:
Department of Computer Science, University of Houston, Houston, TX 77204-3010, USA;Department of Computer Science, University of Houston, Houston, TX 77204-3010, USA;Department of Computer Science, University of Houston, Houston, TX 77204-3010, USA;Department of Computer Science, University of Houston, Houston, TX 77204-3010, USA
Venue:
Engineering Applications of Artificial Intelligence
Year:
2006

Citing 10
Cited 3

A Nearest Hyperrectangle Learning Method

Machine Learning
A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Discriminant Adaptive Nearest Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Generation of Similarity Measures from Different Sources

Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Using clustering to learn distance functions for supervised similarity assessment

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Using supervised clustering to enhance classifiers

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Improving the K-NN classification with the euclidean distance through linear data transformations

ICDM'04 Proceedings of the 4th international conference on Advances in Data Mining: applications in Image Mining, Medicine and Biotechnology, Management and Environmental Control, and Telecommunications

Editorial: Recent advances in data mining

Engineering Applications of Artificial Intelligence
Mining fuzzy association rules from uncertain data

Knowledge and Information Systems
Two phase semi-supervised clustering using background knowledge

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Assessing the similarity between objects is a prerequisite for many data mining techniques. This paper introduces a novel approach to learn distance functions that maximizes the clustering of objects belonging to the same class. Objects belonging to a data set are clustered with respect to a given distance function and the local class density information of each cluster is then used by a weight adjustment heuristic to modify the distance function so that the class density is increased in the attribute space. This process of interleaving clustering with distance function modification is repeated until a ''good'' distance function has been found. We implemented our approach using the k-means clustering algorithm. We evaluated our approach using seven UCI data sets for a traditional 1-nearest-neighbor (1-NN) classifier and a compressed 1-NN classifier, called NCC, that uses the learnt distance function and cluster centroids instead of all the points of a training set. The experimental results show that attribute weighting leads to statistically significant improvements in prediction accuracy over a traditional 1-NN classifier for two of the seven data sets tested, whereas using NCC significantly improves the accuracy of the 1-NN classifier for four of the seven data sets.