Learning Syntax by Automata Induction

Authors:
Robert C. Berwick;Sam Pilato
Affiliations:
MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge, Massachusetts 02139, U.S.A. BERWICK%MIT-OZ@MIT-MC.ARPA;Brattle Research Corporation, 55 Wheeler Street, Cambridge, Massachusetts 02138, U.S.A.
Venue:
Machine Learning
Year:
1987

Citing 0
Cited 12

Learning to Understand Information on the Internet: AnExample-Based Approach

Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
A scalable comparison-shopping agent for the World-Wide Web

AGENTS '97 Proceedings of the first international conference on Autonomous agents
Computational aspects of resilient data extraction from semistructured sources (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Review of "Towards a theory of cognition and computing" by J. Gerard Wolff. Ellis Horwood Limited 1991.

Computational Linguistics - Special issue on inheritance: II
Acquisition of a lexicon from semantic representations of sentences

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
The PSI/PHI architecture for prosodic parsing

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Ethological data mining: an automata-based approach to extract behavioral units and rules

Data Mining and Knowledge Discovery
Software agents: completing patterns and constructing user interfaces

Journal of Artificial Intelligence Research
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research
Pattern extraction improves automata-based syntax analysis in songbirds

ACAL'07 Proceedings of the 3rd Australian conference on Progress in artificial life
Design patterns for metamodels

Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Constructing song syntax by automata induction

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose an explicit computer model for learning natural language syntax based on Angluin's (1982) efficient induction algorithms, using a complete corpus of grammatical example sentences. We use these results to show how inductive inference methods may be applied to learn substantial, coherent subparts of at least one natural language – English – that are not susceptible to the kinds of learning envisioned in linguistic theory. As two concrete case studies, we show how to learn English auxiliary verb sequences (such as could be taking, will have been taking) and the sequences of articles and adjectives that appear before noun phrases (such as the very old big deer). Both systems can be acquired in a computationally feasible amount of time using either positive examples, or, in an incremental mode, with implicit negative examples (examples outside a finite corpus are considered to be negative examples). As far as we know, this is the first computer procedure that learns a full-scale range of noun subclasses and noun phrase structure. The generalizations and the time required for acquisition match our knowledge of child language acquisition for these two cases. More importantly, these results show that just where linguistic theories admit to highly irregular subportions, we can apply efficient automata-theoretic learning algorithms. Since the algorithm works only for fragments of language syntax, we do not believe that it suffices for all of language acquisition. Rather, we would claim that language acquisition is nonuniform and susceptible to a variety of acquisition strategies; this algorithm may be one these.