Character sets of strings

Authors:
Gilles Didier;Thomas Schmidt;Jens Stoye;Dekel Tsur
Affiliations:
Centro de Modelamiento Matematico CNRS UMR 2071 Santiago de Chile, Chile;International NRW Graduate School in Bioinformatics and Genome Research, Center of Biotechnology, Universität Bielefeld, 33594 Bielefeld, Germany;Technische Fakultät, Universität Bielefeld, 33594 Bielefeld, Germany;Computer Science Department, Ben-Gurion University, Israel
Venue:
Journal of Discrete Algorithms
Year:
2007

Citing 4
Cited 13

Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text

Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Finding All Common Intervals of k Permutations

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Efficient text fingerprinting via Parikh mapping

Journal of Discrete Algorithms

Efficient one-dimensional real scaled matching

Journal of Discrete Algorithms
New algorithms for text fingerprinting

Journal of Discrete Algorithms
Finding Nested Common Intervals Efficiently

RECOMB-CG '09 Proceedings of the International Workshop on Comparative Genomics
A faster query algorithm for the text fingerprinting problem

ESA'07 Proceedings of the 15th annual European conference on Algorithms
Indexing a dictionary for subset matching queries

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Computation of median gene clusters

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Faster query algorithms for the text fingerprinting problem

Information and Computation
Algorithms for computing bidirectional best hit r-window gene clusters

FAW-AAIM'11 Proceedings of the 5th joint international frontiers in algorithmics, and 7th international conference on Algorithmic aspects in information and management
Output-Sensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Indexing a dictionary for subset matching queries

Algorithms and Applications
An algorithmic view on multi-related-segments: a unifying model for approximate common interval

TAMC'12 Proceedings of the 9th Annual international conference on Theory and Applications of Models of Computation
Parikh matching in the streaming model

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Various improvements to text fingerprinting

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a string S over a finite alphabet @S, the character set (also called the fingerprint) of a substring S^' of S is the subset C@?@S of the symbols occurring in S^'. The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domains such as rule induction for natural language processing or comparative genomics. Several computational problems concerning the character sets of a string arise from these applications, especially:(1)Output all the maximal locations of substrings having a given character set. (2)Output for each character set C occurring in a given string (or a given collection of strings) all the maximal locations of C. Denoting by n the total length of the considered string or collection of strings, we solve the first problem in @Q(n) time using @Q(n) space. We present two algorithms solving the second problem. The first one runs in @Q(n^2) time using @Q(n) space. The second algorithm has @Q(n|@S|log|@S|) time and @Q(n) space complexity and is an adaptation of an algorithm by Amir et al. [A. Amir, A. Apostolico, G.M. Landau, G. Satta, Efficient text fingerprinting via Parikh mapping, J. Discrete Algorithms 26 (2003) 1-13].