Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Pattern matching algorithms
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Monotony of surprise and large-scale quest for unusual words
Proceedings of the sixth annual international conference on Computational biology
Finding motifs in the twilight zone
Proceedings of the sixth annual international conference on Computational biology
A Technique for Drawing Directed Graphs
IEEE Transactions on Software Engineering
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Algorithms for extracting motifs from biological weighted sequences
Journal of Discrete Algorithms
Linear time algorithm for the longest common repeat problem
Journal of Discrete Algorithms
Unsupervised pattern mining from symbolic temporal data
ACM SIGKDD Explorations Newsletter - Special issue on data mining for health informatics
Visual Exploration of Genomic Data
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
IP6K gene discovery in plant mtDNA
CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
Pertinent background knowledge for learning protein grammars
ECML'06 Proceedings of the 17th European conference on Machine Learning
Hi-index | 0.00 |
Measures relating word frequencies and expectations have been constantly of interest in Bioinformatics studies. With sequence data becoming massively available, exhaustive enumeration of such measures have become conceivable, and yet pose significant computational burden even when limited to words of bounded maximum length. In addition, the display of the huge tables possibly resulting from these counts poses practical problems of visualization and inference.VERBUMCULUS is a suite of software tools for the efficient and fast detection of over- or underrepresented words in nucleotide sequences. The inner core of VERBUMCULUS rests on subtly interwoven properties of statistics, pattern matching and combinatorics on words, that enable one to limit drastically and a priori the set of over-or under-represented candidate words of all lengths in a given sequence, thereby rendering it more feasible both to detect and visualize such words in a fast and practically useful way. This paper is devoted to the description of the facility at the outset and to report experimental results, ranging from simulations on synthetic data to the discovery of regulatory elements on the upstream regions of a set of genes of the yeast.The software VERBUMCULUS is accessible at http://www.cs.ucr.edu/~stelo/Verbumculus/ or http://wwwdbl. dei.unipd.it/Verbumculus/