Linguistic structure as composition and perturbation

Authors:
Carl de Marcken
Affiliations:
MIT AI Laboratory, Cambridge, MA
Venue:
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Year:
1996

Citing 5
Cited 4

Bayesian learning of probabilistic language models

Bayesian learning of probabilistic language models
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
The Unsupervised Acquisition of a Lexicon from Continuous Speech

The Unsupervised Acquisition of a Lexicon from Continuous Speech
Bayesian grammar induction for language modeling

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics

Unsupervised discovery of morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised tokenization for machine translation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Predicting the semantic compositionality of prefix verbs

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses the problem of learning language from unprocessed text and speech signals, concentrating on the problem of learning a lexicon. In particular, it argues for a representation of language in which linguistic parameters like words are built by perturbing a composition of existing parameters. The power of the representation is demonstrated by several examples in text segmentation and compression, acquisition of a lexicon from raw speech, and the acquisition of mappings between text and artificial representations of meaning.