Knowledge discovery based on an implicit and explicit conceptual network
Journal of the American Society for Information Science and Technology
Identification of gene function using prediction by partial matching (PPM) language models
Proceedings of the 17th ACM conference on Information and knowledge management
BioNoculars: extracting protein-protein interactions from biomedical text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Hi-index | 3.85 |
Motivation: With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amounts of high-throughput data efficiently, the demand for automatic extraction of information related to gene functions from text has been increasing. Results: We have developed a method for automatically extracting the biological process functions of genes/protein/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene/protein/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes/proteins/families. The gene/protein/family names are recognized using the gene/protein/family name dictionaries developed by our group. To achieve wide recognition of the gene/protein/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 54--64% with a precision of 91--94% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 gene--GO relationships and 150 000 family--GO relationships for major eukaryotes. Availability: The extracted gene functions are available at http://prime.ontology.ims.u-tokyo.ac.jp Contact: akoike@hgc.jp