Counting and random generation of strings in regular languages
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Polynomial Time Inference of Extended Regular Pattern Languages
Proceedings of RIMS Symposium on Software Science and Engineering
STACS '94 Proceedings of the 11th Annual Symposium on Theoretical Aspects of Computer Science
Compactness and Learning of Classes of Unions of Erasing Regular Pattern Languages
ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
RE-tree: an efficient index structure for regular expressions
The VLDB Journal — The International Journal on Very Large Data Bases
Inferring unions of the pattern languages by the most fitting covers
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Best fitting fixed-length substring patterns for a set of strings
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Developments from enquiries into the learnability of the pattern languages from positive data
Theoretical Computer Science
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Inferring unions of the pattern languages by the most fitting covers
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Hi-index | 0.00 |
We consider the problem of finding a set of patterns that best characterizes a set of strings. To this end, Arimura et. al. [3] considered the use of minimal multiple generalizations (mmg) for such characterizations. Given any sample set, the mmgs are, roughly speaking, the most (syntactically) specific set of languages containing the sample within a given class of languages. Takae et. al. [17] found the mmgs of the class of pattern languages [1] which includes so-called sort symbols to be fairly accurate as predictors for signal peptides. We first reproduce their results using updated data. Then, by using a measure for estimating the level of over-generalizations made by the mmgs, we show results that explain the high level of accuracies resulting from the use of sort symbols, and discuss how better results can be obtained. The measure that we suggests here can also be applied to other types of patterns, e.g. the PROSITE patterns [4].