Efficient exact value computation and applications to biosequence analysis

Authors:
Gill Bejerano
Affiliations:
Hebrew University, Jerusalem, Israel
Venue:
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Year:
2003

Citing 3
Cited 0

Introduction to algorithms

Introduction to algorithms
Elements of information theory

Elements of information theory
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Like other fields of life sciences, bioinformatics has turned to capture biological phenomena through probabilistic models, and to analyse these models using statistical methodology. A central computational problem in applying useful statistical procedures such as various hypothesis testing procedures is the computation of p-values. In this paper, we devise a branch and bound approach to efficient exact p-value computation, and apply it to a likelihood ratio test in a frequency table setting. By recursively partitioning the sample domain and bounding the statistic we avoid the explicit exhaustive enumeration of all possible outcomes which is currently carried by the standard statistical packages. The convexity of the test statistic is further utilized to confer additional speed-up.Empirical evaluation demonstrates a reduction in the computational complexity of the algorithm, even in worst case scenarios, significantly extending the practical range for performing the exact test. We also show that speed-up greatly improves the sparser the underlying null hypothesis is; that computation precision actually increases with speed-up; and that computation time is very moderately affected by the magnitude of the computed p-value. These qualities make our algorithm an appealing alternative to the exhaustive test, the Χ2 asymptotic approximation and Monte Carlo samplers in the respective regimes.The proposed method is readily extendible to other tests and test statistics of interest. We survey several examples of established biosequence analysis methods, where small sample size and sparseness do occur, and to which our computational framework could be applied to improve performance. We briefly demonstrate this with two applications, measuring binding site positional correlations in DNA, and detecting compensatory mutation events in functional RNA.