On comparing two sequences of numbers and its applications to clustering analysis

Authors:
R. J. G. B. Campello;E. R. Hruschka
Affiliations:
Department of Computer Sciences, University of São Paulo at São Carlos, SCC/ICMC/USP, CP 668, São Carlos, SP 13560-970, Brazil;Department of Computer Sciences, University of São Paulo at São Carlos, SCC/ICMC/USP, CP 668, São Carlos, SP 13560-970, Brazil
Venue:
Information Sciences: an International Journal
Year:
2009

Citing 11
Cited 9

Algorithms for clustering data

Algorithms for clustering data
Bioinformatics: the machine learning approach

Bioinformatics: the machine learning approach
Clustering Algorithms

Clustering Algorithms
Introduction to Algorithms

Introduction to Algorithms
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Algorithm Design

Algorithm Design
GAKREM: A novel hybrid clustering algorithm

Information Sciences: an International Journal
On the efficiency of evolutionary fuzzy clustering

Journal of Heuristics
Cluster Analysis

Cluster Analysis
A survey of evolutionary algorithms for clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

A new point symmetry based fuzzy genetic clustering technique for automatic evolution of clusters

Information Sciences: an International Journal
A near-optimal database allocation for reducing the average waiting time in the grid computing environment

Information Sciences: an International Journal
Fuzzy clustering of time series in the frequency domain

Information Sciences: an International Journal
DNA sequence comparison by a novel probabilistic method

Information Sciences: an International Journal
Inducing decision trees from medical decision processes

KR4HC'10 Proceedings of the ECAI 2010 conference on Knowledge representation for health-care
Automatic threshold estimation for data matching applications

Information Sciences: an International Journal
SMART: Stream Monitoring enterprise Activities by RFID Tags

Information Sciences: an International Journal
On the combination of relative clustering validity criteria

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.07

Visualization

Abstract

A conceptual problem that appears in different contexts of clustering analysis is that of measuring the degree of compatibility between two sequences of numbers. This problem is usually addressed by means of numerical indexes referred to as sequence correlation indexes. This paper elaborates on why some specific sequence correlation indexes may not be good choices depending on the application scenario in hand. A variant of the Product-Moment correlation coefficient and a weighted formulation for the Goodman-Kruskal and Kendall's indexes are derived that may be more appropriate for some particular application scenarios. The proposed and existing indexes are analyzed from different perspectives, such as their sensitivity to the ranks and magnitudes of the sequences under evaluation, among other relevant aspects of the problem. The results help suggesting scenarios within the context of clustering analysis that are possibly more appropriate for the application of each index.