Mining disease-specific molecular association profiles from biomedical literature: a case study

  • Authors:
  • Jiao Li;Xiaoyan Zhu;Jake Yue Chen

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Purdue University School of Science, Indianapolis, IN

  • Venue:
  • Proceedings of the 2008 ACM symposium on Applied computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We developed a new literature mining paradigm with the ultimate goal of enabling knowledge discovery in molecular association profiles generated from literature and prior knowledge. We show how to implement the paradigm by building a prototype literature mining framework and performing molecule-bioGist association mining. The framework consists of two modules. The first module, Textual Data Mining, takes the synonym-expanded disease-related molecule names and outputs a list of bioGist list. The second module, Structured Data Mining, takes two inputs, initial disease-related molecular query terms and extracted bioGist list from the first module, and outputs a molecule-bioGist association matrix. Our approach is novel because biomedical literature mining is used here not only as an "information retrieval" tool, but also as a "hypothesis generation and validation" platform. We applied the framework to a molecular pharmacology study of breast cancer. Based on 214 breast cancer-related proteins, 429,067 MEDLINE abstracts were retrieved, and 4,491 drug compounds were identified as bioGists. We evaluated 172 hydrocarbons in the above bioGist list, and found that more than 82.5% hydrocarbons were verified to be related to breast cancer. BRCA1 and BRCA2 were found to have similar profiles in drug compound studies, whereas "doxorubicin", "etoposide", and "paclitaxel" were identified to have similar pharmacological profiles to treat breast cancer.