Analysing microarray expression data through effective clustering

Authors:
E. Masciari;G. M. Mazzeo;C. Zaniolo
Affiliations:
-;-;-
Venue:
Information Sciences: an International Journal
Year:
2014

Citing 21
Cited 0

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Data mining: concepts and techniques

Data mining: concepts and techniques
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
An Approach to Active Spatial Data Mining Based on Statistical Information

IEEE Transactions on Knowledge and Data Engineering
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
k-means: a new generalized k-means clustering algorithm

Pattern Recognition Letters
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The Nearest Subclass Classifier: A Compromise between the Nearest Mean and Nearest Neighbor Classifier

IEEE Transactions on Pattern Analysis and Machine Intelligence
Iterative Cluster Analysis of Protein Interaction Data

Bioinformatics
Bayesian hierarchical clustering

ICML '05 Proceedings of the 22nd international conference on Machine learning
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Techniques for clustering gene expression data

Computers in Biology and Medicine
An improved algorithm for clustering gene expression data

Bioinformatics
Modeling and Visualizing Uncertainty in Gene Expression Clusters Using Dirichlet Process Mixtures

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Noise-robust algorithm for identifying functionally associated biclusters from gene expression data

Information Sciences: an International Journal
An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list

Information Sciences: an International Journal
Automatic summarisation and annotation of microarray data

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special issue on advances in computational intelligence and bioinformatics
A Coclustering Approach for Mining Large Protein-Protein Interaction Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Gene transposon based clone selection algorithm for automatic clustering

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

The recent advances in genomic technologies and the availability of large-scale microarray datasets call for the development of advanced data analysis techniques, such as data mining and statistical analysis to cite a few. Among the mining techniques proposed so far, cluster analysis has become a standard method for the analysis of microarray expression data. It can be used both for initial screening of patients and for extraction of disease molecular signatures. Moreover, clustering can be profitably exploited to characterize genes of unknown function and uncover patterns that can be interpreted as indications of the status of cellular processes. Finally, clustering biological data would be useful not only for exploring the data but also for discovering implicit links between the objects. To this end, several clustering approaches have been proposed in order to obtain a good trade-off between accuracy and efficiency of the clustering process. In particular, great attention has been devoted to hierarchical clustering algorithms for their accuracy in unsupervised identification and stratification of groups of similar genes or patients, while, partition based approaches are exploited when fast computations are required. Indeed, it is well known that no existing clustering algorithm completely satisfies both accuracy and efficiency requirements, thus a good clustering algorithm has to be evaluated with respect to some external criteria that are independent from the metric being used to compute clusters. In this paper, we propose a clustering algorithm called M-CLUBS (for Microarray data CLustering Using Binary Splitting) exhibiting higher accuracy than the hierarchical ones proposed so far while allowing a faster computation with respect to partition based approaches. Indeed, M-CLUBS is faster and more accurate than other algorithms, including k-means and its recently proposed refinements, as we will show in the experimental section. The algorithm consists of a divisive phase and an agglomerative phase; during these two phases, the samples are repartitioned using a least quadratic distance criterion possessing unique analytical properties that we exploit to achieve a very fast computation. M-CLUBS derives good clusters without requiring input from users, and it is robust and impervious to noise, while providing better speed and accuracy than methods, such as BIRCH, that are endowed with the same critical properties. Due to the structural feature of microarray data (they are represented as arrays of numeric values), M-CLUBS is suitable for analyzing them since it is designed to perform well for Euclidean distances. In order to stronger the obtained results we interpreted the obtained clusters by a domain expert and the evaluation by quality measures specifically tailored for biological validity assessment.