A methodology for handling a new kind of outliers present in gene expression patterns

Authors:
Anindya Bhattacharya;Rajat K. De
Affiliations:
Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India;Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Venue:
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Year:
2011

Citing 6
Cited 0

Robust regression and outlier detection

Robust regression and outlier detection
Algorithms for clustering data

Algorithms for clustering data
Data mining: concepts and techniques

Data mining: concepts and techniques
Detecting pattern-based outliers

Pattern Recognition Letters
Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes

Bioinformatics
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the effectiveness of a similarity measure. In the present work, we discuss the problem of handling outliers with different existing similarity measures, and introduce the concepts of a new kind of outliers present in gene expression patterns. We formulate a new similarity, incorporated in Euclidean distance and Pearson correlation coefficient, and then use them in various clustering algorithms to group different gene expression profiles. Assessment of the results are done by using functional annotation. Different existing similarity measures in their traditional form are also used with clustering algorithms for performance comparisons. The results suggest that the new similarity improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.