Improving fuzzy clustering of biological data by metric learning with side information

Authors:
Michele Ceccarelli;Antonio Maratea
Affiliations:
Research Centre on Software Technology, RCOST University of Sannio, Via Traiano 11, 82100 Benevento, Italy;Research Centre on Software Technology, RCOST University of Sannio, Via Traiano 11, 82100 Benevento, Italy
Venue:
International Journal of Approximate Reasoning
Year:
2008

Citing 5
Cited 4

Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Locally linear metric adaptation for semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning a Mahalanobis Metric from Equivalence Constraints

The Journal of Machine Learning Research
Semi-supervised fuzzy c-means clustering of biological data

WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications

A Fuzzy Extension of Some Classical Concordance Measures and an Efficient Algorithm for Their Computation

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Adjusting Fuzzy Similarity Functions for use with standard data mining tools

Journal of Systems and Software
An algorithm for finding gene signatures supervised by survival time data

KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
Semi-supervised clustering with discriminative random fields

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi Supervised methods use a small amount of auxiliary information as a guide in the learning process in presence of unlabeled data. When using a clustering algorithm, the auxiliary information has the form of Side Information, that is a list of co-clustered points. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of Side Information. This fact suggests that the use of Semi Supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available, as is the case of data deriving from biological experiments. The two more frequently used paradigms to include Side Information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a way to improve the classical fuzzy c-means clustering through a two steps procedure: first a series of metrics (one for each cluster) that satisfy a randomly generated set of constraints are learnt based on the data; then a generalized version of the fuzzy c-means (with the metrics computed in the previous step) is executed. We show the benefits and the limitations of this method using real world datasets and a modified version of the Partition Entropy index.