Automatically extracting and representing collocations for language generation

Authors:
Frank A. Smadja;Kathleen R. McKeown
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY
Venue:
ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Year:
1990

Citing 11
Cited 39

Full text indexing based on lexical relations an application: software libraries

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Research toward the development of a lexical knowledge base for natural language processing

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to computational lexicography for natural language processing

Computational lexicography for natural language processing
Knowledge-Based Report Generation: a technique for automatically generating natural language reports from databases

SIGIR '83 Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval
The Linguistic Basis of Text Generation

The Linguistic Basis of Text Generation
PHRED: a generator for natural language interfaces

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
FLUSH: a flexible lexicon design

ACL '87 Proceedings of the 25th annual meeting on Association for Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Types in Functional Unification Grammars

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
COMPLEX: a computational lexicon for natural language systems

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2

Automatic acquisition of subcategorization frames from tagged text

HLT '91 Proceedings of the workshop on Speech and Natural Language
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Using multiple knowledge sources for word sense discrimination

Computational Linguistics
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Exploiting clustering and phrases for context-based information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Feed-forward and recurrent neural networks for source code informal information analysis

Journal of Software Maintenance: Research and Practice
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Computational lexicons: the neat examples and the odd exemplars

ANLC '92 Proceedings of the third conference on Applied natural language processing
The semantics of collocational patterns for reporting verbs

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
Automatic semantic classification of verbs from their syntactic contexts: an implemented classifier for stativity

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
Automatic acquisition of subcategorization frames from untagged text

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
From N-grams to collocations: an evaluation of Xtract

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Contextual word similarity and estimation from sparse data

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Tailoring lexical choice to the user's vocabulary in multimedia explanation generation

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Types in Functional Unification Grammars

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Estimating upper and lower bounds on the performance of word-sense disambiguation programs

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Lexical functions and machine translation

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
A "not-so-shallow" parser for collocational analysis

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Towards automatic extraction of monolingual and bilingual terminology

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Probabilistic tree-adjoining grammar as a framework for statistical natural language processing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Linguistic knowledge generator

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A statistical method for extracting uninterrupted and interrupted collocations from very large corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
The automatic extraction of open compounds from text corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Hypothesizing word association from untagged text

HLT '93 Proceedings of the workshop on Human Language Technology
Translating collocations for use in bilingual lexicons

HLT '94 Proceedings of the workshop on Human Language Technology
Choosing words in computer-generated weather forecasts

Artificial Intelligence - Special volume on connecting language to the world
Orthographic Errors in Web Pages: Toward Cleaner Web Corpora

Computational Linguistics
Collocation extraction based on modifiability statistics

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Learning to Generate Labels for Organizing Search Results from a Domain-Specified Corpus

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Extending lexical association measures for collocation extraction

Computer Speech and Language
Conceptual grouping in word co-occurrence networks

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Automatic acquisition of word interaction patterns from corpora

TextEntry '03 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
Choosing words in computer-generated weather forecasts

Artificial Intelligence - Special volume on connecting language to the world
Textual features for corpus visualization using correspondence analysis

Intelligent Data Analysis
Exploiting aligned parallel corpora in multilingual studies and applications

IWIC'07 Proceedings of the 1st international conference on Intercultural collaboration
Various criteria of collocation cohesion in internet: comparison of resolving power

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Integrating a bottom–up and top–down methodology for building semantic resources for the multilingual legal domain

Semantic Processing of Legal Texts

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collocational knowledge is necessary for language generation. The problem is that collocations come in a large variety of forms. They can involve two, three or more words, these words can be of different syntactic categories and they can be involved in more or less rigid ways. This leads to two main difficulties: collocational knowledge has to be acquired and it must be represented flexibly so that it can be used for language generation. We address both problems in this paper, focusing on the acquisition problem. We describe a program, Xtract, that automatically acquires a range of collocations from large textual corpora and we describe how they can be represented in a flexible lexicon using a unification based formalism.