Towards new information resources for public health: from WORDNET to MEDICALWORDNET

  • Authors:
  • Christiane Fellbaum;Udo Hahn;Barry Smith

  • Affiliations:
  • Department of Psychology, Princeton University Green Hall, Princeton, NJ and Berlin-Bradenburg Academy of Science, Berlin, Germany;Jena University Language and Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany;IFOMIS, Universität des Saarlandes, Saarbrücken, Germany and Department of Philosophy, State University of New York at Buffalo

  • Venue:
  • Journal of Biomedical Informatics - Special issue: Biomedical ontologies
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last two decades, WORONET has evolved as the most comprehensive computational lexicon of general English. In this article, we discuss its potential for supporting the creation of an entirely new kind of information resource for public health, viz. MEDICAL WORDNET. This resource is not to be conceived merely as a lexical extension of the original WORDNET to medical terminology; indeed, there is already a considerable degree of overlap between WORDNET and the vocabulary of medicine. Instead, we propose a new type of repository, consisting of three large collections of (1) medically relevant word forms, structured along the lines of the existing Princeton WORDNET; (2) medically validated propositions, referred to here as medical facts, which will constitute what we shall call MEDICALFACTNET; and (3) propositions reflecting laypersons' medical beliefs, which will constitute what we shall call the MEDICALBELIEFNET. We introduce a methodology for setting up the MEDICAL WORDNET. We then turn to the discussion of research challenges that have to be met to build this new type of information resource. We build a database of sentences relevant to the medical domain. The sentences are generated from WordNet via its relations as well as from medical statements broken down into elementary propositions. Two subcorpora of sentences are distinguished, MedicalBeliefNet and MedicalFactNet. The former is rated for assent by laypersons; the latter for correctness by medical experts. The sentence corpora will be valuable for a variety of applications in information retrieval as well as in research in linguistics and psychology with respect to the study of expert and non-expert beliefs and their linguistic expressions. Our work has to meet several considerable challenges. These include accounting for the distinction between medical experts and laypersons, the social issues of expert-layperson communication in different media, the linguistic aspects of encoding medical knowledge, and the reliability, volume, and emergence of medical knowledge. The work described here has been tested in a small pilot experiment [39] and awaits large-scale implementation.