The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications

Authors:
Costas S. Iliopoulos;Christos Makris;Yannis Panagis;Katerina Perdikuri;Evangelos Theodoridis;Athanasios Tsakalidis
Affiliations:
Department of Computer Science, King's College London, Strand, London WC2R2LS, England. E-mail: csi@dcs.kcl.ac.uk;Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece. E-mail: {makri,panagis,perdikur, theodori}@ceid.upatras.gr;Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece. E-mail: {makri,panagis,perdikur, theodori}@ceid.upatras.gr;Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece. E-mail: {makri,panagis,perdikur, theodori}@ceid.upatras.gr;Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece. E-mail: {makri,panagis,perdikur, theodori}@ceid.upatras.gr;Research Academic Computer Technology Institute, N. Kazantzaki Str., Rio 26504 Patras, Greece. E-mail: tsak@cti.gr
Venue:
Fundamenta Informaticae
Year:
2006

Citing 11
Cited 2

Optimal superprimitivity testing for strings

Information Processing Letters
The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A Statistical Method for Finding Transcription Factor Binding Sites

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Simple and Flexible Detection of Contiguous Repeats Using a Suffix Tree (Preliminary Version)

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
Finding Maximal Pairs with Bounded Gap

CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Finding Maximal Repetitions in a Word in Linear Time

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
MISAE: A New Approach for Regulatory Motif Extraction

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Identification of DNA regulatory motifs using Bayesian variable selection

Bioinformatics

Overlapping repetitions in weighted sequence

Proceedings of the CUBE International Information Technology Conference
Compressed property suffix trees

Information and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.