A vocabulary development and visualization tool based on natural language processing and the mining of textural patient reports

Authors:
Carol Friedman;Hongfang Liu;Lyudmila Shagina
Affiliations:
Department of Medical Informatics, Columbia University, 622 West 168 Street, VC-5 Bldg, New York, NY;Department of Medical Informatics, Columbia University, 622 West 168 Street, VC-5 Bldg, New York, NY;Department of Medical Informatics, Columbia University, 622 West 168 Street, VC-5 Bldg, New York, NY
Venue:
Journal of Biomedical Informatics
Year:
2003

Citing 2
Cited 4

Representations of health concepts: a cognitive perspective

Journal of Biomedical Informatics
Unsupervised, corpus-based method for extending a biomedical terminology

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3

Terminology model discovery using natural language processing and visualization techniques

Journal of Biomedical Informatics
Development of a wiki-based, expert community-driven nanosystem vocabulary

DCMI '06 Proceedings of the 2006 international conference on Dublin Core and Metadata Applications: metadata for knowledge and learning
Using statistical text mining to supplement the development of an ontology

Journal of Biomedical Informatics
Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Medical terminologies are critical for automated healthcare systems. Some terminologies, such as the UMLS and SNOMED are comprehensive, whereas others specialize in limited domains (i.e., BIRADS) or are developed for specific applications. An important feature of a terminology is comprehensive coverage of relevant clinical terms and ease of use by users, which include computerized applications. We have developed a method for facilitating vocabulary development and maintenance that is based on utilization of natural language processing to mine large collections of clinical reports in order to obtain information on terminology as expressed by physicians. Once the reports are processed and the terms structured and collected into an XML representational schema, it is possible to determine information about terms, such as frequency of occurrence, compositionality, relations to other terms (such as modifiers), and correspondence to a controlled vocabulary. This paper describes the method and discusses how it can be used as a tool to help vocabulary builders navigate through the terms physicians use, visualize their relations to other terms via a flexible viewer, and determine their correspondence to a controlled vocabulary.