Clustering nodes in large-scale biological networks using external memory algorithms

  • Authors:
  • Ahmed Shamsul Arefin;Mario Inostroza-Ponta;Luke Mathieson;Regina Berretta;Pablo Moscato

  • Affiliations:
  • Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, New South Wales, Australia;Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Chile;Department of Computing, Faculty of Science, Macquarie University, Sydney, Australia;Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, New South Wales and Hunter Medical Research Institute, Information Based Medicine Program, Australia;Biomarker Discovery and Information-Based Medicine, The University of Newcastle and Hunter Medical Research Institute, Information Based Medicine Program and ARC Centre of Excellence in Bioinforma ...

  • Venue:
  • ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Novel analytical techniques have dramatically enhanced our understanding of many application domains including biological networks inferred from gene expression studies. However, there are clear computational challenges associated to the large datasets generated from these studies. The algorithmic solution of some NP-hard combinatorial optimization problems that naturally arise on the analysis of large networks is difficult without specialized computer facilities (i.e. supercomputers). In this work, we address the data clustering problem of large-scale biological networks with a polynomial-time algorithm that uses reasonable computing resources and is limited by the available memory. We have adapted and improved the MSTkNN graph partitioning algorithm and redesigned it to take advantage of external memory (EM) algorithms. We evaluate the scalability and performance of our proposed algorithm on a well-known breast cancer microarray study and its associated dataset.