Large scale inference of deterministic transductions: tenjinno problem 1

Authors:
Alexander Clark
Affiliations:
Department of Computer Science, University of London, Egham, Surrey
Venue:
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Year:
2006

Citing 12
Cited 1

On the Computational Complexity of Approximating Distributions by Probabilistic Automata

Machine Learning - Computational learning theory
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
On Context-Free Languages

Journal of the ACM (JACM)
Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improve the Learning of Subsequential Transducers by Using Alignments and Dictionaries

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Learning Stochastic Regular Grammars by Means of a State Merging Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Unification with lazy non-redundant copying

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Memory-Based Learning of morphology with stochastic transducers

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning deterministic context free grammars: The Omphalos competition

Machine Learning
Stochastic lexicalized inversion transduction grammar for alignment

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Partially distribution-free learning of regular languages from positive samples

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Stochastic inversion transduction grammars with application to segmentation, bracketing, and alignment of parallel corpora

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Towards Machine Learning of Grammars and Compilers of Programming Languages

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss the problem of large scale grammatical inference in the context of the Tenjinno competition, with reference to the inference of deterministic finite state transducers, and discuss the design of the algorithms and the design and implementation of the program that solved the first problem. Though the OSTIA algorithm has good asymptotic guarantees for this class of problems, the amount of data required is prohibitive. We therefore developed a new strategy for inferring large scale transducers that is more adapted for large random instances of the type in question, which involved combining traditional state merging algorithms for inference of finite state automata with EM based alignment algorithms and state splitting algorithms.