Discovering patterns to extract protein--protein interactions from full texts

  • Authors:
  • Minlie Huang;Xiaoyan Zhu;Yu Hao;Donald G. Payan;Kunbin Qu;Ming Li

  • Affiliations:
  • State Key Laboratory of Intelligent Technology and Systems (LITS), Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China;State Key Laboratory of Intelligent Technology and Systems (LITS), Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China;State Key Laboratory of Intelligent Technology and Systems (LITS), Department of Computer Science and Technology, University of Tsinghua, Beijing, 100084, China;Rigel Pharmaceuticals Inc, 1180 Veterans. Blvd, South San Francisco, CA 94080, USA;Rigel Pharmaceuticals Inc, 1180 Veterans. Blvd, South San Francisco, CA 94080, USA;Bioinformatics Laboratory, School of Computer Science, University of Waterloo, N2L 3G1, Ontario, Canada

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Although there are several databases storing protein--protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein--protein interactions from biomedical texts. Results: We present a novel and robust approach for extracting protein--protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%. Availability: The program is available on request from the authors.