The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Data Structures and Algorithms
Data Structures and Algorithms
Rapid identification of repeated patterns in strings, trees and arrays
STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Extraction of Recurrent Patterns from Stratified Ordered Trees
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Analyzing the input stream for character- level errors in unconstrained text entry evaluations
ACM Transactions on Computer-Human Interaction (TOCHI)
KMRCRelat Algorithm for finding repeated words in sequences: Application on biological sequences
Journal of Computational Methods in Sciences and Engineering - Selected papers from the International Conference on Computer Science,Software Engineering, Information Technology, e-Business, and Applications, 2003
Characterization of contour regularities based on the Levenshtein edit distance
Pattern Recognition Letters
A bibliography on computational molecular biology and genetics
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.14 |
An algorithm is presented for extracting and localizing a common structure in a family of strings with time complexity O(N/sup 2/L/sup 2/ log/sub 2/ L) where N is the number of strings and L their maximum length. The method could be extended to two-dimensional image analysis. This structure appears as alignments of words which are similar but not necessarily identical and which occur approximately at the same location in all the strings. The method works in two successive stages. First, a fast algorithm is used for drawing up a directory of exactly repeated patterns appearing in a given majority of strings. Second, the algorithm constructs recursively anchoring patterns by a divide-and-conquer strategy and converges on a maximum number of alignments. This algorithm has been applied to find common a priori unknown features in families of biological macromolecules, with quite good results. One of these families included 23 strings of about 100 characters each. Each characteristic structure has been achieved within less than one minute on a MULTIX-DPS8 system.