Comparing corpora and lexical ambiguity

  • Authors:
  • Patrick Ruch;Arnaud Gaudinat

  • Affiliations:
  • Geneva University Hospital, Switzerland;University of Geneva, Switzerland

  • Venue:
  • CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we compare two types of corpus, focusing on the lexical ambiguity of each of them. The first corpus consists mainly of newspaper articles and literature excerpts, while the second belongs to the medical domain. To conduct the study, we have used two different disambiguation tools. However, first of all, we must verify the performance of each system in its respective application domain. We then use these systems in order to assess and compare both the general ambiguity rate and the particularities of each domain. Quantitative results show that medical documents are lexically less ambiguous than unrestricted documents. Our conclusions show the importance of the application area in the design of NLP tools.