Extracting constraints on word usage from large text corpora

Authors:
Kathleen McKeown;Diane Litman;Rebecca Passonneau
Affiliations:
Columbia University;Columbia University;Columbia University
Venue:
HLT '91 Proceedings of the workshop on Speech and Natural Language
Year:
1992

Citing 0
Cited 1

Applying collocation segmentation to the ACL anthology reference corpus

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our research focuses on the identification of word usage constraints from large text corpora. Such constraints are important for natural language systems, both for the problem of selecting vocabulary for language generation and for disambiguating lexical meaning in interpretation, The first stage of our research involves the development of systems that can automatically extract such constraints from corpora and empirical methods for analyzing text. Identified constraints will be represented in a lexicon that will be tested computationally as part of a natural language system. We are also identifying lexical constraints for machine translation using the aligned Hansard corpus as training data and are identifying many-to-many word alignments.