Journal of the American Society for Information Science
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Literature-based discovery by lexical statistics
Journal of the American Society for Information Science
Journal of the American Society for Information Science and Technology
Text mining: generating hypotheses from MEDLINE
Journal of the American Society for Information Science and Technology
Constructing an associative concept space for literature-based discovery
Journal of the American Society for Information Science and Technology
Using annotations from controlled vocabularies to find meaningful associations
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Hi-index | 0.00 |
The amount of knowledge accumulated in published scientific papers has increased due to the continuing progress being made in scientific research. Since numerous papers have only reported fragments of scientific facts, there are possibilities for discovering new knowledge by connecting these facts. We therefore developed a system called BioTermNet to draft a conceptual network with hybrid methods of information extraction and information retrieval. Two concepts are regarded as related in this system if (a) their relationship is clearly described in MEDLINE abstracts or (b) they have distinctively co-occurred in abstracts. PRIME data, including protein interactions and functions extracted by NLP techniques, are used in the former, and the Singhal-measure for information retrieval is used in the latter. Relationships that are not clearly or directly described in an abstract can be extracted by connecting multiple concepts. To evaluate how well this system performs, Swanson's association between Raynaud's disease and fish oil and that between migraine and magnesium were tested with abstracts that had been published before the discovery of these associations. The result was that when start and end concepts were given, plausible and understandable intermediate concepts connecting them could be detected. When only the start concept was given, not only the focused concept (magnesium and fish oil) but also other probable concepts could be detected as related concept candidates. Finally, this system was applied to find diseases related to the BRCA1 gene. Some other new potentially related diseases were detected along with diseases whose relations to BRCA1 were already known. The BioTermNet is available at http://btn.ontology.ims.u-tokyo.ac.jp. © 2007 Wiley Periodicals, Inc.