Comparing corpora and lexical ambiguity

Authors:
Patrick Ruch;Arnaud Gaudinat
Affiliations:
Geneva University Hospital, Switzerland;University of Geneva, Switzerland
Venue:
CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Year:
2000

Citing 5
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Tagging French: comparing a statistical and a constraint-based method

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
MULTEXT: Multilingual Text Tools and Corpora

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
The IPS system

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Minimal commitment and full lexical disambiguation: balancing rules and hidden Markov Models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we compare two types of corpus, focusing on the lexical ambiguity of each of them. The first corpus consists mainly of newspaper articles and literature excerpts, while the second belongs to the medical domain. To conduct the study, we have used two different disambiguation tools. However, first of all, we must verify the performance of each system in its respective application domain. We then use these systems in order to assess and compare both the general ambiguity rate and the particularities of each domain. Quantitative results show that medical documents are lexically less ambiguous than unrestricted documents. Our conclusions show the importance of the application area in the design of NLP tools.