DILEMMA-2: a lemmatizer-tagger for medical abstracts

  • Authors:
  • Hans Paulussen;Willy Martin

  • Affiliations:
  • Facultés Universitaires Notre-Dame de la Paix, Namur, Belgium;Vrije Universiteit, Amsterdam, The Netherlands

  • Venue:
  • ANLC '92 Proceedings of the third conference on Applied natural language processing
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports on the development of DILEMMA-2*, a lemmatizer-tagger for the sublanguage of medical abstracts. The program is an extension of DILEMMA-1, a lemmatizer-tagger for general English texts.In the first section a brief outline is given of DILEMMA-1. Particular attention is paid to the original concept of a default category which is linked with a categorial graph by means of a pointer system. In the second section we show why DILEMMA-1 was not able to get a suitable score when lemmatizing medical abstracts, the main reason being the inability to recognize sublanguage specific vocabulary. In the next section a description is given of the most important errors along with their solutions; these errors are then categorized as gaps or wrong assignments. The former could be dealt with in either a suffix list or a gaps filler default. The latter mainly concerned wrongly assigned past participles and errors on noun, verb or adjective assignment.After implementation of the proposed solutions, a comparison is made between the results of DILEMMA-1 and DILEMMA-2, showing that the results of DILEMMA-1 have been improved substantially within a sublanguage context, and this by using linguistic, i.e. sublanguage, knowledge, thus avoiding ad hoc remedies.