KSPF: using gene sequence patterns and data mining for biological knowledge management

  • Authors:
  • Hei-Chia Wang;Hung-Chih Kuo;Hong-Hwa Chen;Yu-Yun Hsiao;Wen-Chieh Tsai

  • Affiliations:
  • Institute of Information Management, National Cheng Kung University, 1st University Road, Tainan, 701, Taiwan;Institute of Information Management, National Cheng Kung University, 1st University Road, Tainan, 701, Taiwan;Department of Life Sciences, National Cheng Kung University, 1st University Road, Tainan, 701, Taiwan and Institute of Biotechnology, National Cheng Kung University, 1st University Road, Tainan, 7 ...;Department of Life Sciences, National Cheng Kung University, 1st University Road, Tainan, 701, Taiwan;Department of Life Sciences, National Cheng Kung University, 1st University Road, Tainan, 701, Taiwan

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2005

Quantified Score

Hi-index 12.06

Visualization

Abstract

Most traditional approaches for annotating protein families are not efficient because of high throughput sequences, complex analytic tools and unordered literature and results cannot be reused. Here, we describe a framework, knowledge sharing for protein families (KSPF), that uses sequence pattern data mining and knowledge management to improve upon traditional approaches. It is divided into three modules: automation, retrieval and refinement. This framework builds an environment that allows biological researchers to submit an unknown protein sequence and search for information on its sub-family. Once this sub-family protein category has been found, the related literature and knowledge records provided by previous users can be retrieved. The possible functions of the protein can then be predicted by use of the literature and records. The proposed framework is applicable to all types of protein families. We describe the search for a plant lipid transfer protein (PLTP) with use of the framework. The system KS-PLTP functions to map an unknown sequence to the sub-family of the PLTP knowledge base and predict the sequence's possible function. The prediction rate of KS-PLTP reached 89.6%.