RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Finding motifs using random projections
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Most significant substring mining based on chi-square measure
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Mining statistically significant substrings using the chi-square statistic
Proceedings of the VLDB Endowment
Theoretical Computer Science
Hi-index | 0.00 |
Assessing statistical significance of over-representation of exceptional words is becoming an important task in computational biology. We show on two problems how large deviation methodology applies. First, when some oligomer H occurs more often than expected, e.g. may be overrepresented, large deviations allow for a very efficient computation of the so-called p-value. The second problem we address is the possible changes in the oligomers distribution induced by the over-representation of some pattern. Discarding this noise allows for the detection of weaker signals. Related algorithmic and complexity issues are discussed and compared to previous results. The approach is illustrated with three typical examples of applications on biological data.