Tagging English text with a probabilistic model
Computational Linguistics
Concept-oriented parsing of definitions
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Concept-oriented parsing of definitions
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
A tagger/lemmatiser for Dutch medical language
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Text mining for medical documents using a hidden markov model
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Hi-index | 0.00 |
This paper reports on the development of DILEMMA-2*, a lemmatizer-tagger for the sublanguage of medical abstracts. The program is an extension of DILEMMA-1, a lemmatizer-tagger for general English texts.In the first section a brief outline is given of DILEMMA-1. Particular attention is paid to the original concept of a default category which is linked with a categorial graph by means of a pointer system. In the second section we show why DILEMMA-1 was not able to get a suitable score when lemmatizing medical abstracts, the main reason being the inability to recognize sublanguage specific vocabulary. In the next section a description is given of the most important errors along with their solutions; these errors are then categorized as gaps or wrong assignments. The former could be dealt with in either a suffix list or a gaps filler default. The latter mainly concerned wrongly assigned past participles and errors on noun, verb or adjective assignment.After implementation of the proposed solutions, a comparison is made between the results of DILEMMA-1 and DILEMMA-2, showing that the results of DILEMMA-1 have been improved substantially within a sublanguage context, and this by using linguistic, i.e. sublanguage, knowledge, thus avoiding ad hoc remedies.