The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications

  • Authors:
  • Costas S. Iliopoulos;Christos Makris;Yannis Panagis;Katerina Perdikuri;Evangelos Theodoridis;Athanasios Tsakalidis

  • Affiliations:
  • Department of Computer Science, King's College London, Strand, London WC2R2LS, England. E-mail: csi@dcs.kcl.ac.uk;Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece. E-mail: {makri,panagis,perdikur, theodori}@ceid.upatras.gr;Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece. E-mail: {makri,panagis,perdikur, theodori}@ceid.upatras.gr;Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece. E-mail: {makri,panagis,perdikur, theodori}@ceid.upatras.gr;Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece. E-mail: {makri,panagis,perdikur, theodori}@ceid.upatras.gr;Research Academic Computer Technology Institute, N. Kazantzaki Str., Rio 26504 Patras, Greece. E-mail: tsak@cti.gr

  • Venue:
  • Fundamenta Informaticae
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.