From regular expressions to deterministic automata
Theoretical Computer Science
A Four Russians algorithm for regular expression pattern matching
Journal of the ACM (JACM)
A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
Regular expressions into finite automata
Theoretical Computer Science
Text algorithms
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
Programming Techniques: Regular expression search algorithm
Communications of the ACM
Fast and flexible string matching by combining bit-parallelism and suffix automata
Journal of Experimental Algorithmics (JEA)
Text-Retrieval: Theory and Practice
Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Translating Regular Expressions into Small epsilon-Free Nondeterministic Finite Automata
STACS '97 Proceedings of the 14th Annual Symposium on Theoretical Aspects of Computer Science
A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching
CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
Fast and flexible string matching by combining bit-parallelism and suffix automata
Journal of Experimental Algorithmics (JEA)
High Similarity Sequence Comparison in Clustering Large Sequence Databases
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Accelerating Approximate Subsequence Search on Large Protein Sequence Databases
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Fast bit-parallel matching for network and regular expressions
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Compressing regular expressions' DFA table by matrix decomposition
CIAA'10 Proceedings of the 15th international conference on Implementation and application of automata
Hi-index | 0.00 |
The problem of fast searching of a pattern that contains Classes of characters and Bounded size Gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CBG [RK] — x(2, 3) — [DE] — x(2, 3) — Y, where the brackets match any of the letters inside, and x(2, 3) a gap of length between 2 and 3). Currently, the only way to search a CBG in a text is to convert it into a full regular expression (RE). However, a RE is more sophisticated than a CBG, and searching it with a RE pattern matching algorithm complicates the search and makes it slow. This is the reason why we design in this article two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques. The first one looks exactly once at each text character. The second one does not need to consider all the text characters and hence it is usually faster than the first one, but in bad cases may have to read the same text character more than once. We then propose a criterion based on the form of the CBG to choose a-priori the fastest between both. We performed many practical experiments using the PROSITE database, and all them show that our algorithms are the fastest in virtually all cases.