LearnMet: learning domain-specific distance metrics for plots of scientific functions

Authors:
Aparna Varde;Elke Rundensteiner;Carolina Ruiz;Mohammed Maniruzzaman;Richard Sisson, Jr.
Affiliations:
Department of Math and Computer Science, Virginia State University, Petersburg, USA 23806;Department of Computer Science, Worcester Polytechnic Institute, Worcester, USA 01609;Department of Computer Science, Worcester Polytechnic Institute, Worcester, USA 01609;Center for Heat Treating Excellence, Metal Processing Institute, Worcester, USA 01609;Center for Heat Treating Excellence, Metal Processing Institute, Worcester, USA 01609
Venue:
Multimedia Tools and Applications
Year:
2007

Citing 8
Cited 1

Neural networks for pattern recognition

Neural networks for pattern recognition
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Data mining: concepts and techniques

Data mining: concepts and techniques
Tri-plots: scalable tools for multidimensional data mining

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Ensembling neural networks: many could be better than all

Artificial Intelligence
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Similarity Search in Multimedia Databases

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Learning semantics-preserving distance metrics for clustering graphical data

MDM '05 Proceedings of the 6th international workshop on Multimedia data mining: mining integrated media and complex data

SNIF TOOL: sniffing for patterns in continuous streams

Proceedings of the 17th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific experimental results are often depicted as plots of functions to aid their visual analysis and comparison. In computationally comparing these plots using techniques such as similarity search and clustering, the notion of similarity is typically distance. However, it is seldom known which distance metric(s) best preserve(s) semantics in the respective domain. It is thus desirable to learn such domain-specific distance metrics for the comparison of plots. This paper describes a technique called LearnMet proposed to learn such metrics. The input to LearnMet is a training set with actual clusters of plots. These are iteratively compared with clusters over the same plots predicted using an arbitrary but fixed clustering algorithm. Using a guessed initial metric for clustering, adjustments are made to the metric in each epoch based on the error between the predicted and actual clusters until the error is minimal or below a given threshold. The metric giving the lowest error is output as the learned metric. The proposed LearnMet technique and its enhancements are discussed in detail in this paper. The primary application of LearnMet is clustering plots in the Heat Treating domain. Hence it is rigorously evaluated using Heat Treating data. Given distinct test sets for evaluation, clusters of plots predicted using the learned metrics are compared with given actual clusters over the same plots. The extent to which the predicted and actual clusters match each other denotes the accuracy of the learned metrics.