A methodology for handling a new kind of outliers present in gene expression patterns

  • Authors:
  • Anindya Bhattacharya;Rajat K. De

  • Affiliations:
  • Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India;Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India

  • Venue:
  • PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the effectiveness of a similarity measure. In the present work, we discuss the problem of handling outliers with different existing similarity measures, and introduce the concepts of a new kind of outliers present in gene expression patterns. We formulate a new similarity, incorporated in Euclidean distance and Pearson correlation coefficient, and then use them in various clustering algorithms to group different gene expression profiles. Assessment of the results are done by using functional annotation. Different existing similarity measures in their traditional form are also used with clustering algorithms for performance comparisons. The results suggest that the new similarity improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.