Effects of cDNA microarray time-series data size on gene regulatory network inference accuracy

  • Authors:
  • Vijender Chaitankar;Preetam Ghosh;Chaoyang Zhang;Ping Gong;Edward J. Perkins

  • Affiliations:
  • The Univ of Southern Mississippi, Hattiesburg MS;The Univ of Southern Mississippi, Hattiesburg MS;The Univ of Southern Mississippi, Hattiesburg MS;SpecPro Inc, Vicksburg, MS;U.S. Army Engineer Research and Development Center, Vicksburg, MS

  • Venue:
  • Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A number of models and algorithms have been proposed in the past for gene regulatory network (GRN) inference; however, none of them address the effects of the size of the time-series microarray expression data in terms of number of time-points. In this paper, we study this problem by analyzing the behavior of two algorithms based on information theory models. These algorithms were implemented on different sizes of data generated by synthetic network generation tools. Experiments show that the performances of these algorithms reach a saturation point after a specific data size, thus giving the biologist an idea about what size of data will give the best inference accuracy. Also, the fact that the accuracy saturates after a specific number of time points (the saturation point being different for different algorithms) suggests that generating time-series data for a lot of time-points will not necessary improve the inference accuracy beyond a certain point. To understand this saturation, we found out that the information theoretic quantity, mutual information, tends to zero as the number of time points increase although the entropy in the network rises to unity. This illustrates the fact that mutual information (MI) might not be the best metric to use for GRN inference algorithms. To modify the MI metric we introduce a new method of computing time lags between any pair of genes and present the time lagged mutual information (TLMI) metric for reverse engineering of GRNs.