Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

  • Authors:
  • Joaquim Ferreira da Silva;Gaël Dias;Sylvie Guilloré;José Gabriel Pereira Lopes

  • Affiliations:
  • -;-;-;-

  • Venue:
  • EPIA '99 Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
  • Year:
  • 1999

Quantified Score

Hi-index 0.01

Visualization

Abstract

The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, IR and IE. In this paper we propose two new association measures, the Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) for the extraction of contiguous and noncontiguous MWUs. Both measures are used by a new algorithm, the LocalMaxs, that requires neither empirically obtained thresholds nor complex linguistic filters. We assess the results obtained by both measures by comparing them with reference association measures (Specific Mutual Information, Φ2, Dice and Log-Likelihood coefficients) over a multilingual parallel corpus. An additional experiment has been carried out over a part-of-speech tagged Portuguese corpus for extracting contiguous compound verbs.