Surface grammatical analysis for the extraction of terminological noun phrases

  • Authors:
  • Didier Bourigault

  • Affiliations:
  • Ecole des Hautes Etudes en Sciences Sociales, et, Electricité de France, Clamart, France

  • Venue:
  • COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

LEXTER is a software package for extracting terminology. A corpus of French language texts on any subject field is fed in, and LEXTER produces a list of likely terminological units to be submitted to an expert to be validated. To identify the terminological units, LEXTER takes their form into account and proceeds in two main stages: analysis, parsing. In the first stage, LEXTER uses a base of rules designed to indentify frontier markers in view to analysing the texts and extracting maximal-length noun phrases. In the second stage, LEXTER parses these maximal-length noun phrases to extract subgroups which by virtue of their grammatical structure and their place in the maximal-length noun phrases are likely to be terminological units. In this article, the type of analysis used (surface grammatical analysis) is highlighted, as the methodological approach adopted to adapt the rules (experimental approach).