Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches

Authors:
Yanxin Lu;Hua Xu;Neeraja B. Peterson;Qi Dai;Min Jiang;Joshua C. Denny;Mei Liu
Affiliations:
Department of Human Anatomy, Histology and Embryology, Fudan University, 138 Yi Xue Yuan Road, Shanghai 200032, China/ Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted In ...;Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA;Division of General Internal Medicine and Public Health, Department of Medicine, Vanderbilt University, Suite 6000 Medical Centre East, North Tower, Nashville, TN 37232, USA;Division of Epidemiology, Department of Medicine, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203-1738, USA;Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA;Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA;Department of Biomedical Informatics, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA
Venue:
International Journal of Data Mining and Bioinformatics
Year:
2012

Citing 9
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A vocabulary development and visualization tool based on natural language processing and the mining of textural patient reports

Journal of Biomedical Informatics
A term extraction tool for expanding content in the domain of functioning, disability, and health: proof of concept

Journal of Biomedical Informatics - Special issue: Building nursing knowledge through infomatics: from concept representation to data mining
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Customization in a unified framework for summarizing medical literature

Artificial Intelligence in Medicine
Summarization from medical documents: a survey

Artificial Intelligence in Medicine
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Mining web data for epidemiological surveillance

PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Much epidemiologic information resides in literature, which is not in a computable format. To extract information and build knowledge bases of epidemiologic studies, we developed a system to extract noun phrases about epidemiologic exposures and outcomes. The system consists of two components: a natural language processing (NLP) engine a machine learning (ML) based classifier. Four ML algorithms were applied and compared over different feature sets. To evaluate the performance of the system, we manually constructed an annotated dataset. The system achieved the highest F-measure of 82.0% for extracting exposure terms, and 70% for extracting outcome terms.