Mining cancer genes with running-sum statistics
Proceedings of the third international workshop on Data and text mining in bioinformatics
A GMM-IG framework for selecting genes as expression panel biomarkers
Artificial Intelligence in Medicine
Computational Biology and Chemistry
Permutation testing improves Bayesian network learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Integrative correlation: Properties and relation to canonical correlations
Journal of Multivariate Analysis
Hi-index | 3.84 |
Motivation: The proliferation of public data repositories creates a need for meta-analysis methods to efficiently evaluate, integrate and validate related datasets produced by independent groups. A t-based approach has been proposed to integrate effect size from multiple studies by modeling both intra-and between-study variation. Recently, a non-parametric ‘rank product’ method, which is derived based on biological reasoning of fold-change criteria, has been applied to directly combine multiple datasets into one meta study. Fisher's Inverse χ2 method, which only depends on P-values from individual analyses of each dataset, has been used in a couple of medical studies. While these methods address the question from different angles, it is not clear how they compare with each other. Results: We comparatively evaluate the three methods; t-based hierarchical modeling, rank products and Fisher's Inverse χ2 test with P-values from either the t-based or the rank product method. A simulation study shows that the rank product method, in general, has higher sensitivity and selectivity than the t-based method in both individual and meta-analysis, especially in the setting of small sample size and/or large between-study variation. Not surprisingly, Fisher's χ2 method highly depends on the method used in the individual analysis. Application to real datasets demonstrates that meta-analysis achieves more reliable identification than an individual analysis, and rank products are more robust in gene ranking, which leads to a much higher reproducibility among independent studies. Though t-based meta-analysis greatly improves over the individual analysis, it suffers from a potentially large amount of false positives when P-values serve as threshold. We conclude that careful meta-analysis is a powerful tool for integrating multiple array studies. Contact: fxhong@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online.