Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Finding All Common Intervals of k Permutations
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Efficient text fingerprinting via Parikh mapping
Journal of Discrete Algorithms
Efficient one-dimensional real scaled matching
Journal of Discrete Algorithms
New algorithms for text fingerprinting
Journal of Discrete Algorithms
Finding Nested Common Intervals Efficiently
RECOMB-CG '09 Proceedings of the International Workshop on Comparative Genomics
A faster query algorithm for the text fingerprinting problem
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Indexing a dictionary for subset matching queries
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Computation of median gene clusters
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Faster query algorithms for the text fingerprinting problem
Information and Computation
Algorithms for computing bidirectional best hit r-window gene clusters
FAW-AAIM'11 Proceedings of the 5th joint international frontiers in algorithmics, and 7th international conference on Algorithmic aspects in information and management
Output-Sensitive Algorithms for Finding the Nested Common Intervals of Two General Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Indexing a dictionary for subset matching queries
Algorithms and Applications
An algorithmic view on multi-related-segments: a unifying model for approximate common interval
TAMC'12 Proceedings of the 9th Annual international conference on Theory and Applications of Models of Computation
Parikh matching in the streaming model
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Various improvements to text fingerprinting
Journal of Discrete Algorithms
Hi-index | 0.00 |
Given a string S over a finite alphabet @S, the character set (also called the fingerprint) of a substring S^' of S is the subset C@?@S of the symbols occurring in S^'. The study of the character sets of all the substrings of a given string (or a given collection of strings) appears in several domains such as rule induction for natural language processing or comparative genomics. Several computational problems concerning the character sets of a string arise from these applications, especially:(1)Output all the maximal locations of substrings having a given character set. (2)Output for each character set C occurring in a given string (or a given collection of strings) all the maximal locations of C. Denoting by n the total length of the considered string or collection of strings, we solve the first problem in @Q(n) time using @Q(n) space. We present two algorithms solving the second problem. The first one runs in @Q(n^2) time using @Q(n) space. The second algorithm has @Q(n|@S|log|@S|) time and @Q(n) space complexity and is an adaptation of an algorithm by Amir et al. [A. Amir, A. Apostolico, G.M. Landau, G. Satta, Efficient text fingerprinting via Parikh mapping, J. Discrete Algorithms 26 (2003) 1-13].