An endogeneous corpus-based method for structural noun phrase disambiguation

Authors:
Didier Bourigault
Affiliations:
Electricité de France - Direction des Etudes et Recherches, Service Informatique et Mathématiques Appliquées, Clamart, France
Venue:
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Year:
1993

Citing 4
Cited 8

Disambiguating prepositional phrase attachments by using on-line dictionary definitions

Computational Linguistics - Special issue of the lexicon
A computational model of language performance: Data Oriented Parsing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Surface grammatical analysis for the extraction of terminological noun phrases

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Closed yesterday and closed minds: asking the right questions of the corpus to distinguish thematic from sentential relations

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4

Natural Language Processing and Digital Libraries

Information Extraction: Towards Scalable, Adaptable Systems
Recycling terms into a partial parser

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Term extraction + term clustering: an integrated platform for computer-aided terminology

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Expansion of multi-word terms for indexing and retrieval using morphology and syntax

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Symbolic word clustering for medium-size corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Projecting corpus-based semantic links on a thesaurus

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
TExtractor: a multilingual terminology extraction tool

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Improving term extraction with terminological resources

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a method for structural noun phrase disambiguation which mainly relies on the examination of the text corpus under analysis and doesn't need to integrate any domain-dependent lexico- or syntactico-semantic information. This method is implemented in the Terminology Extraction Sottware LEXTER. We first explain why the integration of LEXTER in the LEXTER-K project, which aims at building a tool for knowledge extraction from large technical text corpora, requires improving the quality of the terminolgy extracted by LEXTER. Then we briefly describe the way LEXTER works and show what kind of disambiguation it has to perform when parsing "maximal-length" noun phrases. We introduce a method of disambiguation which relies on a very simple idea: whenever LEXTER has to choose among several competing noun sub-groups in order to disambiguate a maximal-length noun phrase, it checks each of these sub-groups if it occurs anywhere else in the corpus in a non-ambiguous situation, and then it makes a choice. The half-a-million words corpus analysis resulted in an efficient strategy of disambiguation. The average rates are:27% no disambiguation70% correct disambiguation3% wrong disambiguation