Learning context-free grammar using improved tabular representation

Authors:
Olgierd Unold;Marcin Jaworski
Affiliations:
The Institute of Computer Engineering, Control and Robotics, Wroclaw University of Technology, Poland;The Institute of Computer Engineering, Control and Robotics, Wroclaw University of Technology, Poland
Venue:
Applied Soft Computing
Year:
2010

Citing 20
Cited 1

Grammatical interface for even linear languages based on control sets

Information Processing Letters
Learning context-free grammars from structural data in polynomial time

Theoretical Computer Science
Efficient learning of context-free grammars from positive structural examples

Information and Computation
The inference of tree languages from finite samples: an algebraic approach

Theoretical Computer Science
An efficient probabilistic context-free parsing algorithm that computes prefix probabilities

Computational Linguistics
Genetic algorithms + data structures = evolution programs (3rd ed.)

Genetic algorithms + data structures = evolution programs (3rd ed.)
A note on the grammatical inference problem for even linear languages

Fundamenta Informaticae
Predicting Protein Secondary Structure Using Stochastic Tree Grammars

Machine Learning - Special issue on learning with probabilistic representations
Inferring pure context-free languages from positive data

Acta Cybernetica
Statistical Language Learning

Statistical Language Learning
Stochastic Inference of Regular Tree Languages

Machine Learning
Learning Context-Free Grammars from Partially Structured Examples

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Stochastic k-testable Tree Languages and Applications

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Generalized Stochastic Tree Automata for Multi-relational Data Mining

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
GA-based Learning of Context-Free Grammars using Tabular Representations

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Efficient Learning of Semi-structured Data from Queries

ALT '01 Proceedings of the 12th International Conference on Algorithmic Learning Theory
Extracting grammar from programs: evolutionary approach

ACM SIGPLAN Notices
Learning Context-Free Grammars from Partially Structured Examples: Juxtaposition of GCS with TBL

HIS '07 Proceedings of the 7th International Conference on Hybrid Intelligent Systems
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research
Learning context-free grammars using tabular representations

Pattern Recognition

A memetic grammar inference algorithm for language learning

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an improved version of TBL algorithm [Y. Sakakibara, Learning context-free grammars using tabular representations, Pattern Recognition 38(2005) 1372-1383; Y. Sakakibara, M. Kondo, GA-based learning of context-free grammars using tabular representations, in: Proceedings of 16th International Conference in Machine Learning (ICML-99), Morgan-Kaufmann, Los Altos, CA, 1999] for inference of context-free grammars in Chomsky Normal Form. The TBL algorithm is a novel approach to overcome the hardness of learning context-free grammars from examples without structural information available. The algorithm represents the grammars by parsing tables and thanks to this tabular representation the problem of grammar learning is reduced to the problem of partitioning the set of nonterminals. Genetic algorithm is used to solve NP-hard partitioning problem. In the improved version modified fitness function and new delete specialized operator is applied. Computer simulations have been performed to determine improved a tabular representation efficiency. The set of experiments has been divided into 2 groups: in the first one learning the unknown context-free grammar proceeds without any extra information about grammatical structure, in the second one learning is supported by a partial knowledge of the structure. In each of the performed experiments the influence of partition block size in an initial population and the size of population at grammar induction has been tested. The new version of TBL algorithm has been experimentally proved to be not so much vulnerable to block size and population size, and is able to find the solutions faster than standard one.