Learning Local Languages and Their Application to DNA Sequence Analysis

Authors:
Takashi Yokomori;Satoshi Kobayashi
Affiliations:
Waseda Univ., Tokyo, Japan;Tokyo Denki Univ., Saitama, Japan
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
1998

Citing 4
Cited 11

Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Polynomial-Time Learnability in the Limit of Strictly Deterministic Automata

Machine Learning
Inference of Reversible Languages

Journal of the ACM (JACM)
Inductive Inference: Theory and Methods

ACM Computing Surveys (CSUR)

Locality, Reversibility, and Beyond: Learning Languages from Positive Data

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Polynomial-time identification of very simple grammars from positive data

Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
New Morphic Characterizations of Languages in Chomsky Hierarchy Using Insertion and Locality

LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
Structural analysis of regulatory DNA sequences using grammar inference and Support Vector Machine

Neurocomputing
Transducer inference by assembling specific languages

ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Morphic characterizations of languages in Chomsky hierarchy with insertion and locality

Information and Computation
Mutation systems

LATA'11 Proceedings of the 5th international conference on Language and automata theory and applications
Learning analysis by reduction from positive data

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Planar languages and learnability

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Protein motif prediction by grammatical inference

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Morphic characterizations with insertion systems controlled by a context of length one

Theoretical Computer Science

Quantified Score

Hi-index	0.14

Visualization

Abstract

This paper concerns an efficient algorithm for learning in the limit a special type of regular languages called strictly locally testable languages from positive data, and its application to identifying the protein 驴-chain region in amino acid sequences. First, we present a linear time algorithm that, given a strictly locally testable language, learns (identifies) its deterministic finite state automaton in the limit from only positive data. This provides us with a practical and efficient method for learning a specific concept domain of sequence analysis. We then describe several experimental results using the learning algorithm developed above. Following a theoretical observation which strongly suggests that a certain type of amino acid sequences can be expressed by a locally testable language, we apply the learning algorithm to identifying the protein 驴-chain region in amino acid sequences for hemoglobin. Experimental scores show an overall success rate of 95 percent correct identification for positive data, and 96 percent for negative data.