DILEMMA-2: a lemmatizer-tagger for medical abstracts

Authors:
Hans Paulussen;Willy Martin
Affiliations:
Facultés Universitaires Notre-Dame de la Paix, Namur, Belgium;Vrije Universiteit, Amsterdam, The Netherlands
Venue:
ANLC '92 Proceedings of the third conference on Applied natural language processing
Year:
1992

Citing 0
Cited 5

Tagging English text with a probabilistic model

Computational Linguistics
Concept-oriented parsing of definitions

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Concept-oriented parsing of definitions

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
A tagger/lemmatiser for Dutch medical language

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Text mining for medical documents using a hidden markov model

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports on the development of DILEMMA-2*, a lemmatizer-tagger for the sublanguage of medical abstracts. The program is an extension of DILEMMA-1, a lemmatizer-tagger for general English texts.In the first section a brief outline is given of DILEMMA-1. Particular attention is paid to the original concept of a default category which is linked with a categorial graph by means of a pointer system. In the second section we show why DILEMMA-1 was not able to get a suitable score when lemmatizing medical abstracts, the main reason being the inability to recognize sublanguage specific vocabulary. In the next section a description is given of the most important errors along with their solutions; these errors are then categorized as gaps or wrong assignments. The former could be dealt with in either a suffix list or a gaps filler default. The latter mainly concerned wrongly assigned past participles and errors on noun, verb or adjective assignment.After implementation of the proposed solutions, a comparison is made between the results of DILEMMA-1 and DILEMMA-2, showing that the results of DILEMMA-1 have been improved substantially within a sublanguage context, and this by using linguistic, i.e. sublanguage, knowledge, thus avoiding ad hoc remedies.