Biomedical association mining and validation

Authors:
Premchand Gandra;Meeta Pradhan;Mathew J. Palakal
Affiliations:
Indiana Univeristy Purdue University, Indianapolis, Indiana;Indiana Univeristy Purdue University, Indianapolis, Indiana;Indiana Univeristy Purdue University, Indianapolis, Indiana
Venue:
ISB '10 Proceedings of the International Symposium on Biocomputing
Year:
2010

Citing 4
Cited 0

A Multi-Level Text Mining Method to Extract Biological Relationships

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Heuristic shortest path algorithms for transportation applications: state of the art

Computers and Operations Research
APID2NET

Bioinformatics
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

During last decade, the data published in biomedical literature has increased exponentially. With this growth, it has become hard to manually read all the papers for required information. Many text mining algorithms and approaches have been developed to extract information from the existing literature. One such important information is to find the associations between functional terms like genes, proteins, drugs, diseases etc. These associations can be casual, explicit or implicit. One of the most common applications is to mine protein-protein interactions from Pubmed. The focus of the present study is to identify and validate implicit protein -- protein associations as these are hard to identify from literature. These associations, when detected automatically, are noisy and need to be validated for their biological significance. In the process of validating, these associations were passed through series of filters and an algorithm to remove the noise present in the data. In this study, we used 16 gene ids to retrieve 32,693 documents with 193,738 sentences related to regenerative biology from the Pubmed database. From these sentences, BioMap found 10004 explicit and 30,000 implicit protein interaction pairs that were validated using the proposed methodology. Finally 308 implicit pairs were identified as outcome of this methodology. These results indicate that the proposed methods can be effectively used for biological verification of implicit protein-protein interactions that are obtained through literature mining.