Literature mining on pharmacokinetics numerical data: A feasibility study

Authors:
Zhiping Wang;Seongho Kim;Sara K. Quinney;Yingying Guo;Stephen D. Hall;Luis M. Rocha;Lang Li
Affiliations:
Division of Biostatistics, Department of Medicine, School of Medicine, Indiana University, 410 West 10th Street, Suite 3044, Indianapolis, IN 46202, USA;Division of Biostatistics, Department of Medicine, School of Medicine, Indiana University, 410 West 10th Street, Suite 3044, Indianapolis, IN 46202, USA;Division of Biostatistics, Department of Medicine, School of Medicine, Indiana University, 410 West 10th Street, Suite 3044, Indianapolis, IN 46202, USA;Eli Lilly and Company, Indianapolis, IN, USA;Eli Lilly and Company, Indianapolis, IN, USA;School of Informatics, Indiana University, Bloomington, IN, USA and Instituto Gulbenkian de Ciencia, Oeiras, Portugal;Division of Biostatistics, Department of Medicine, School of Medicine, Indiana University, 410 West 10th Street, Suite 3044, Indianapolis, IN 46202, USA
Venue:
Journal of Biomedical Informatics
Year:
2009

Citing 2
Cited 1

Making large-scale support vector machine learning practical

Advances in kernel methods
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A feasibility study of literature mining is conducted on drug PK parameter numerical data with a sequential mining strategy. Firstly, an entity template library is built to retrieve pharmacokinetics relevant articles. Then a set of tagging and extraction rules are applied to retrieve PK data from the article abstracts. To estimate the PK parameter population-average mean and between-study variance, a linear mixed meta-analysis model and an E-M algorithm are developed to describe the probability distributions of PK parameters. Finally, a cross-validation procedure is developed to ascertain false-positive mining results. Using this approach to mine midazolam (MDZ) PK data, an 88% precision rate and 92% recall rate are achieved, with an F-score=90%. It greatly out-performs a conventional data mining approach (support vector machine), which has an F-score of 68.1%. Further investigate on 7 more drugs reveals comparable performances of our sequential mining approach.