Text algorithms
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Discovering Best Variable-Length-Don't-Care Patterns
DS '02 Proceedings of the 5th International Conference on Discovery Science
Finding Best Patterns Practically
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
A Practical Algorithm to Find the Best Episode Patterns
DS '01 Proceedings of the 4th International Conference on Discovery Science
A practical algorithm to find the best subsequence patterns
Theoretical Computer Science
Distinguishing string selection problems
Information and Computation
An O(N^2) Algorithm for Discovering Optimal Boolean Pattern Pairs
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr's), longest common subsequences (LCSeq's), and window-accumulated longest common subsequences (wLCSeq's). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set), is NP-hard for all of the above measurements. In order to achieve practically efficient algorithms for finding the best classifier, we investigate pruning heuristics and fast string matching techniques based on the properties of the local relatedness measurements.