Towards new information resources for public health: from WORDNET to MEDICALWORDNET

Authors:
Christiane Fellbaum;Udo Hahn;Barry Smith
Affiliations:
Department of Psychology, Princeton University Green Hall, Princeton, NJ and Berlin-Bradenburg Academy of Science, Berlin, Germany;Jena University Language and Information Engineering (JULIE) Lab, Friedrich-Schiller-Universität Jena, Jena, Germany;IFOMIS, Universität des Saarlandes, Saarbrücken, Germany and Department of Philosophy, State University of New York at Buffalo
Venue:
Journal of Biomedical Informatics - Special issue: Biomedical ontologies
Year:
2006

Citing 14
Cited 2

A methodology for creating user views in database design

ACM Transactions on Database Systems (TODS)
Partitioning and composing knowledge

Information Systems - Knowledge engineering
Building a large-scale knowledge base for machine translation

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Introduction to knowledge systems

Introduction to knowledge systems
WordNet: a lexical database for English

Communications of the ACM
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project

Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project
Patients' and physicians' understanding of health and biomedical concepts: relationship to the design of EMR systems

Journal of Biomedical Informatics
Basic description logics

The description logic handbook
Description logic systems

The description logic handbook
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Knowledge Representation and Reasoning

Knowledge Representation and Reasoning

Introduction: international medical informatics association working group 6 and the 2005 Rome conference

Journal of Biomedical Informatics - Special issue: Biomedical ontologies
Building a BioWordNet by using WordNet's data formats and WordNet's software infrastructure: a failure story

SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last two decades, WORONET has evolved as the most comprehensive computational lexicon of general English. In this article, we discuss its potential for supporting the creation of an entirely new kind of information resource for public health, viz. MEDICAL WORDNET. This resource is not to be conceived merely as a lexical extension of the original WORDNET to medical terminology; indeed, there is already a considerable degree of overlap between WORDNET and the vocabulary of medicine. Instead, we propose a new type of repository, consisting of three large collections of (1) medically relevant word forms, structured along the lines of the existing Princeton WORDNET; (2) medically validated propositions, referred to here as medical facts, which will constitute what we shall call MEDICALFACTNET; and (3) propositions reflecting laypersons' medical beliefs, which will constitute what we shall call the MEDICALBELIEFNET. We introduce a methodology for setting up the MEDICAL WORDNET. We then turn to the discussion of research challenges that have to be met to build this new type of information resource. We build a database of sentences relevant to the medical domain. The sentences are generated from WordNet via its relations as well as from medical statements broken down into elementary propositions. Two subcorpora of sentences are distinguished, MedicalBeliefNet and MedicalFactNet. The former is rated for assent by laypersons; the latter for correctness by medical experts. The sentence corpora will be valuable for a variety of applications in information retrieval as well as in research in linguistics and psychology with respect to the study of expert and non-expert beliefs and their linguistic expressions. Our work has to meet several considerable challenges. These include accounting for the distinction between medical experts and laypersons, the social issues of expert-layperson communication in different media, the linguistic aspects of encoding medical knowledge, and the reliability, volume, and emergence of medical knowledge. The work described here has been tested in a small pilot experiment [39] and awaits large-scale implementation.