Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data

Authors:
Stefano Rovetta;Francesco Masulli
Affiliations:
Department of Computer and Information Sciences and CNISM, University of Genova, Via Dodecaneso 35 I-16146 Genova, Italy;Department of Computer and Information Sciences and CNISM, University of Genova, Via Dodecaneso 35 I-16146 Genova, Italy
Venue:
Pattern Recognition
Year:
2006

Citing 21
Cited 3

A theory of the learnable

Communications of the ACM
Algorithms for clustering data

Algorithms for clustering data
A deterministic annealing approach to clustering

Pattern Recognition Letters
Training knowledge-based neural networks to recognize genes in DNA sequences

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Self-organizing maps

Self-organizing maps
Data clustering: a review

ACM Computing Surveys (CSUR)
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
Redefining Clustering for High-Dimensional Applications

IEEE Transactions on Knowledge and Data Engineering
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Iterative Rank based Methods for Clustering

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
A New Cluster Isolation Criterion Based on Dissimilarity Increments

IEEE Transactions on Pattern Analysis and Machine Intelligence
A generalized kernel approach to dissimilarity-based classification

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
Data Mining for Case-Based Reasoning in High-Dimensional Biological Domains

IEEE Transactions on Knowledge and Data Engineering
Gene clustering by Latent Semantic Indexing of MEDLINE abstracts

Bioinformatics
Clustering Using a Similarity Measure Based on Shared Near Neighbors

IEEE Transactions on Computers
Editorial: The fundamental role of pattern recognition for gene-expression/microarray data in bioinformatics

Pattern Recognition
Fuzzy and possibilistic shell clustering algorithms and their application to boundary detection and surface approximation. II

IEEE Transactions on Fuzzy Systems
`Neural-gas' network for vector quantization and its application to time-series prediction

IEEE Transactions on Neural Networks

Clustering in the membership embedding space

International Journal of Knowledge Engineering and Soft Data Paradigms
A new clustering algorithm for coordinate-free data

Pattern Recognition
Membership embedding space approach and spectral clustering

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering algorithms are routinely used in biomedical disciplines, and are a basic tool in bioinformatics. Depending on the task at hand, there are two most popular options, the central partitional techniques and the agglomerative hierarchical clustering techniques and their derivatives. These methods are well studied and well established. However, both categories have some drawbacks related to data dimensionality (for partitional algorithms) and to the bottom-up structure (for hierarchical agglomerative algorithms). To overcome these limitations, motivated by the problem of gene expression analysis with DNA microarrays, we present a hierarchical clustering algorithm based on a completely different principle, which is the analysis of shared farthest neighbors. We present a framework for clustering using ranks and indexes, and introduce the shared farthest neighbors (SFN) clustering criterion. We illustrate the properties of the method and present experimental results on different data sets, using the strategy of evaluating data clustering by extrinsic knowledge given by class labels.