Knowledge discovery based on an implicit and explicit conceptual network

  • Authors:
  • Asako Koike;Toshihisa Takagi

  • Affiliations:
  • Dept. of Computnl. Biol., Grad. Sch. of Frontier Sci., The Univ. of Tokyo and Central Research Laboratory, Hitachi Ltd., 1-280 Higashi-Koigakubo, Kokubunji, Tokyo, 185-8601, Japan;Department of Computational Biology, Graduate School of Frontier Science, The University of Tokyo, Kiban-3A1 (CB01) 5-1-5, Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The amount of knowledge accumulated in published scientific papers has increased due to the continuing progress being made in scientific research. Since numerous papers have only reported fragments of scientific facts, there are possibilities for discovering new knowledge by connecting these facts. We therefore developed a system called BioTermNet to draft a conceptual network with hybrid methods of information extraction and information retrieval. Two concepts are regarded as related in this system if (a) their relationship is clearly described in MEDLINE abstracts or (b) they have distinctively co-occurred in abstracts. PRIME data, including protein interactions and functions extracted by NLP techniques, are used in the former, and the Singhal-measure for information retrieval is used in the latter. Relationships that are not clearly or directly described in an abstract can be extracted by connecting multiple concepts. To evaluate how well this system performs, Swanson's association between Raynaud's disease and fish oil and that between migraine and magnesium were tested with abstracts that had been published before the discovery of these associations. The result was that when start and end concepts were given, plausible and understandable intermediate concepts connecting them could be detected. When only the start concept was given, not only the focused concept (magnesium and fish oil) but also other probable concepts could be detected as related concept candidates. Finally, this system was applied to find diseases related to the BRCA1 gene. Some other new potentially related diseases were detected along with diseases whose relations to BRCA1 were already known. The BioTermNet is available at http://btn.ontology.ims.u-tokyo.ac.jp. © 2007 Wiley Periodicals, Inc.